Our TTS models satisfy the following criteria:
- Fully end-to-end;
- Large library of voices;
- Natural-sounding speech;
- One-line usage, minimal, portable;
- Impressively fast on CPU and GPU;
- For the Russian language - automated stress and homographs;
You can basically use our models in 3 flavours:
- Via PyTorch Hub:
torch.hub.load(); - Via pip:
pip install sileroand thenfrom silero import silero_tts; - Via caching the required models and utils manually and modifying if necessary;
Models are downloaded on demand both by pip and PyTorch Hub. If you need caching, do it manually or via invoking a necessary model once (it will be downloaded to a cache folder). Please see these docs for more information.
PyTorch Hub and pip package are based on the same code. All of the torch.hub.load examples can be used with the pip package via this basic change:
from silero import silero_tts
model, example_text = silero_tts(language='ru',
speaker='v5_ru')
audio = model.apply_tts(text=example_text)All of the provided models are listed in the models.yml file. Any metadata and newer versions will be added there.
V5 models support SSML. Also see Colab examples for main SSML tag usage.
Russian-only models support automated stress and homographs.
| ID | Speakers | Auto-stress / Homographs | Language | SR | Colab |
|---|---|---|---|---|---|
v5_ru |
aidar, baya, kseniya, xenia, eugene |
yes / yes | ru (Russian) |
8000, 24000, 48000 |
V4 models support SSML. Also see Colab examples for main SSML tag usage.
V4 models: v4_ru, v4_cyrillic, v4_ua, v4_uz, v4_indic
| ID | Speakers | Auto-stress | Language | SR | Colab |
|---|---|---|---|---|---|
v4_ru |
aidar, baya, kseniya, xenia, eugene, random |
yes | ru (Russian) |
8000, 24000, 48000 |
|
v4_cyrillic |
b_ava, marat_tt, kalmyk_erdni... |
no | cyrillic (Avar, Tatar, Kalmyk, ...) |
8000, 24000, 48000 |
|
v4_ua |
mykyta, random |
no | ua (Ukrainian) |
8000, 24000, 48000 |
|
v4_uz |
dilnavoz |
no | uz (Uzbek) |
8000, 24000, 48000 |
|
v4_indic |
hindi_male, hindi_female, ..., random |
no | indic (Hindi, Telugu, ...) |
8000, 24000, 48000 |
V3 models support SSML. Also see Colab examples for main SSML tag usage.
V3 models: v3_en, v3_en_indic, v3_de, v3_es, v3_fr, v3_indic
| ID | Speakers | Auto-stress | Language | SR | Colab |
|---|---|---|---|---|---|
v3_en |
en_0, en_1, ..., en_117, random |
no | en (English) |
8000, 24000, 48000 |
|
v3_en_indic |
tamil_female, ..., assamese_male, random |
no | en (English) |
8000, 24000, 48000 |
|
v3_de |
eva_k, ..., karlsson, random |
no | de (German) |
8000, 24000, 48000 |
|
v3_es |
es_0, es_1, es_2, random |
no | es (Spanish) |
8000, 24000, 48000 |
|
v3_fr |
fr_0, ..., fr_5, random |
no | fr (French) |
8000, 24000, 48000 |
|
v3_indic |
hindi_male, hindi_female, ..., random |
no | indic (Hindi, Telugu, ...) |
8000, 24000, 48000 |
Basic dependencies for Colab examples:
torch, 1.10+ for v3 models/ 2.0+ for v4 and v5 models;torchaudio, latest version bound to PyTorch should work (required only because models are hosted together with STT, not required for work);omegaconf, latest (can be removed as well, if you do not load all of the configs);
# V5
import torch
language = 'ru'
model_id = 'v5_ru'
sample_rate = 48000
speaker = 'xenia'
device = torch.device('cpu')
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language=language,
speaker=model_id)
model.to(device) # gpu or cpu
audio = model.apply_tts(text=example_text,
speaker=speaker,
sample_rate=sample_rate)- Standalone usage only requires PyTorch 1.12+ and the Python Standard Library;
- Please see the detailed examples in Colab;
# V5
import os
import torch
device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'
if not os.path.isfile(local_file):
torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v5_ru.pt',
local_file)
model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)
example_text = 'Меня зовут Лева Королев. Я из готов. И я уже готов открыть все ваши замки любой сложности!'
sample_rate = 48000
speaker='baya'
audio_paths = model.save_wav(text=example_text,
speaker=speaker,
sample_rate=sample_rate)Check out our TTS Wiki page.
To be superseded with v5 model(s) soon.
Supported tokenset:
!,-.:?iµöабвгдежзийклмнопрстуфхцчшщъыьэюяёђѓєіјњћќўѳғҕҗҙқҡңҥҫүұҳҷһӏӑӓӕӗәӝӟӥӧөӱӳӵӹ
| Speaker_ID | Language | Gender |
|---|---|---|
| b_ava | Avar | F |
| b_bashkir | Bashkir | M |
| b_bulb | Bulgarian | M |
| b_bulc | Bulgarian | M |
| b_che | Chechen | M |
| b_cv | Chuvash | M |
| cv_ekaterina | Chuvash | F |
| b_myv | Erzya | M |
| b_kalmyk | Kalmyk | M |
| b_krc | Karachay-Balkar | M |
| kz_M1 | Kazakh | M |
| kz_M2 | Kazakh | M |
| kz_F3 | Kazakh | F |
| kz_F1 | Kazakh | F |
| kz_F2 | Kazakh | F |
| b_kjh | Khakas | F |
| b_kpv | Komi-Ziryan | M |
| b_lez | Lezghian | M |
| b_mhr | Mari | F |
| b_mrj | Mari High | M |
| b_nog | Nogai | F |
| b_oss | Ossetic | M |
| b_ru | Russian | M |
| b_tat | Tatar | M |
| marat_tt | Tatar | M |
| b_tyv | Tuvinian | M |
| b_udm | Udmurt | M |
| b_uzb | Uzbek | M |
| b_sah | Yakut | M |
| kalmyk_erdni | Kalmyk | M |
| kalmyk_delghir | Kalmyk | F |
(!!!) All input sentences should be romanized to ISO format using aksharamukha. An example for hindi:
# V3
import torch
from aksharamukha import transliterate
# Loading model
model, example_text = torch.hub.load(repo_or_dir='snakers4/silero-models',
model='silero_tts',
language='indic',
speaker='v4_indic')
orig_text = "प्रसिद्द कबीर अध्येता, पुरुषोत्तम अग्रवाल का यह शोध आलेख, उस रामानंद की खोज करता है"
roman_text = transliterate.process('Devanagari', 'ISO', orig_text)
print(roman_text)
audio = model.apply_tts(roman_text,
speaker='hindi_male')| Language | Speakers | Romanization function |
|---|---|---|
| hindi | hindi_female, hindi_male |
transliterate.process('Devanagari', 'ISO', orig_text) |
| malayalam | malayalam_female, malayalam_male |
transliterate.process('Malayalam', 'ISO', orig_text) |
| manipuri | manipuri_female |
transliterate.process('Bengali', 'ISO', orig_text) |
| bengali | bengali_female, bengali_male |
transliterate.process('Bengali', 'ISO', orig_text) |
| rajasthani | rajasthani_female, rajasthani_female |
transliterate.process('Devanagari', 'ISO', orig_text) |
| tamil | tamil_female, tamil_male |
transliterate.process('Tamil', 'ISO', orig_text, pre_options=['TamilTranscribe']) |
| telugu | telugu_female, telugu_male |
transliterate.process('Telugu', 'ISO', orig_text) |
| gujarati | gujarati_female, gujarati_male |
transliterate.process('Gujarati', 'ISO', orig_text) |
| kannada | kannada_female, kannada_male |
transliterate.process('Kannada', 'ISO', orig_text) |
Try our models, create an issue, join our chat, email us, and read the latest news.
@misc{Silero Models,
author = {Silero Team},
title = {Silero Models: pre-trained enterprise-grade STT / TTS models and benchmarks},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/snakers4/silero-models}},
commit = {insert_some_commit_here},
email = {hello@silero.ai}
}-
STT:
-
TTS:
-
VAD:
-
Text Enhancement:
- We have published a model for text repunctuation and recapitalization for four languages - link
-
STT
- OpenAI решили распознавание речи! Разбираемся так ли это … - link
- Наши сервисы для бесплатного распознавания речи стали лучше и удобнее - link
- Telegram-бот Silero бесплатно переводит речь в текст - link
- Бесплатное распознавание речи для всех желающих - link
- Последние обновления моделей распознавания речи из Silero Models - link
- Сжимаем трансформеры: простые, универсальные и прикладные способы cделать их компактными и быстрыми - link
- Ультимативное сравнение систем распознавания речи: Ashmanov, Google, Sber, Silero, Tinkoff, Yandex - link
- Мы опубликовали современные STT модели сравнимые по качеству с Google - link
- Понижаем барьеры на вход в распознавание речи - link
- Огромный открытый датасет русской речи версия 1.0 - link
- Насколько Быстрой Можно Сделать Систему STT? - link
- Наша система Speech-To-Text - link
- Speech-To-Text - link
-
TTS:
- Делаем быстрый, качественный и доступный синтез на языках России — нужно ваше участие - link
- Мы решили задачу омографов и ударений в русском языке - link
- Теперь наш синтез также доступен в виде бота в Телеграме - link
- Может ли синтез речи обмануть систему биометрической идентификации? - link
- Теперь наш синтез на 20 языках - link
- Теперь наш публичный синтез в супер-высоком качестве, в 10 раз быстрее и без детских болячек - link
- Синтезируем голос бабушки, дедушки и Ленина + новости нашего публичного синтеза - link
- Мы сделали наш публичный синтез речи еще лучше - link
- Мы Опубликовали Качественный, Простой, Доступный и Быстрый Синтез Речи - link
-
VAD:
- Новый релиз публичного детектора голоса Silero VAD v6 - link
- Наш публичный детектор голоса стал лучше - link
- А ты используешь VAD? Что это такое и зачем он нужен - link
- Модели для Детекции Речи, Чисел и Распознавания Языков - link
- Мы опубликовали современный Voice Activity Detector и не только -link
-
Text Enhancement:
