GitHub topics: multi-speaker

Repositories

netease-youdao/EmotiVoice

EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

Language: Python - Size: 3.67 MB - Last synced at: 11 days ago - Pushed at: about 1 year ago - Stars: 8,280 - Forks: 724

r9y9/deepvoice3_pytorch

PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Language: Python - Size: 6.78 MB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 1,980 - Forks: 487

mikebrady/shairport-sync Fork of abrasive/shairport

AirPlay and AirPlay 2 audio player

Language: C - Size: 11.6 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 8,093 - Forks: 609

aishoot/LSTM_PIT_Speech_Separation

Two-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.

Language: Jupyter Notebook - Size: 7.38 MB - Last synced at: 12 days ago - Pushed at: over 3 years ago - Stars: 309 - Forks: 90

keonlee9420/Comprehensive-Transformer-TTS

A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate TTS

Language: Python - Size: 143 MB - Last synced at: 3 days ago - Pushed at: almost 3 years ago - Stars: 325 - Forks: 42

TheSeraphim/scribe-forge-ai

🎵 Complete offline audio transcription system with speaker diarization using OpenAI Whisper and PyAnnote. Features automatic audio cleaning, precise timestamps, multiple output formats (JSON/TXT/Markdown), and support for 20+ audio formats. No external APIs required - works entirely offline.

Language: Python - Size: 2.32 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

anton-jeran/MULTI-AUDIODEC

This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.

Language: Python - Size: 7.41 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 46 - Forks: 6

keonlee9420/Comprehensive-E2E-TTS

A Non-Autoregressive End-to-End Text-to-Speech (text-to-wav), supporting a family of SOTA unsupervised duration modelings. This project grows with the research community, aiming to achieve the ultimate E2E-TTS

Language: Python - Size: 3.45 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 146 - Forks: 19

ranchlai/mandarin-tts

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder, with biaobei and aishell3 datasets

Language: Python - Size: 85.4 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 446 - Forks: 106

nikitashvarts/CocktailPartySpeakerRecognition

An Algorithm for Speaker Recognition in a Multi-Speaker Environment

Language: Python - Size: 15.6 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

ZoraizQ/urdu-speech-recognition

Urdu Speech Recognition using Kaldi ASR, by training Triphone Acoustic GMMs using the PRUS dataset.

Language: Shell - Size: 1.16 GB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 3 - Forks: 0

keonlee9420/Comprehensive-Tacotron2

PyTorch Implementation of Google's Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. This implementation supports both single-, multi-speaker TTS and several techniques to enforce the robustness and efficiency of the model.

Language: Python - Size: 130 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 31 - Forks: 8

Related Keywords

multi-speaker 15 tts 7 pytorch 7 deep-learning 6 speech-synthesis 5 fastspeech2 4 text-to-speech 4 hifi-gan 3 neural-tts 3 speech-separation 3 python 3 single-speaker 3 audio-processing 2 comprehensive 2 unsupervised 2 audio-separation 2 tacotron 2 speech-enhancement 2 mel-gan 2 non-ar 2 machine-learning 2 end-to-end 2 non-autoregressive 2 speech-recognition 2 sota 2 ultimate-tts 2 tensorflow 1 tts-chinese 1 aishell3 1 text-to-wav 1 jets 1 spatial-audio 1 room-impulse-responses 1 room-impulse-response 1 rir 1 overlapping-speech 1 overlap 1 neural-coding 1 codec 1 binaural 1 audio-codecs 1 source-separation 1 deeplearning 1 deep-learning-architectures 1 adaptive-learning 1 transfer-learning 1 korean 1 tacotron2 1 robustness 1 reduction-factor 1 efficiency 1 diagonal-guided-attention 1 autoregressive 1 urdu 1 prus 1 kaldi-asr 1 speaker-recognition 1 lstm 1 cocktail-party-problem 1 tts-hanzi 1 whisper 1 robust-speech-recognition 1 permutation-invariant-training 1 synchronized-audio 1 multi-room-audio 1 embedded-systems 1 audio-streaming 1 audio-player 1 audio 1 airplay-2 1 airplay 1 speech-processing 1 style 1 speech 1 prompt 1 emotivoice 1 emotion 1 ai 1 transcription-tool 1 timestamps 1 speech-to-text 1 speaker-diarization 1 pyannote 1 openai-whisper 1 offline-transcription 1 nlp 1 huggingface 1 ffmpeg 1 diarization 1 audio-transcription 1 audio-cleaning 1 audio-analysis 1 transformer 1 supervised 1 fastspeech 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos