GitHub topics: speech-synthesis

Repositories

NVIDIA/NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language: Python - Size: 451 MB - Last synced at: about 10 hours ago - Pushed at: about 10 hours ago - Stars: 14,912 - Forks: 2,953

UKR-PROJECTS/chatterbox-tts-colab

Transform any text into natural-sounding speech, clone voices from audio samples, and create professional voiceovers - all running free in Google Colab!

Language: Jupyter Notebook - Size: 18.6 KB - Last synced at: about 16 hours ago - Pushed at: about 16 hours ago - Stars: 1 - Forks: 0

espeak-ng/espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.

Language: C - Size: 72.4 MB - Last synced at: about 20 hours ago - Pushed at: 14 days ago - Stars: 5,211 - Forks: 1,043

TheVoxProject/calcvox

Accessible and open-source talking calculator for everyone.

Language: C++ - Size: 1010 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

voicepaw/so-vits-svc-fork

so-vits-svc fork with realtime support, improved interface and more features.

Language: Python - Size: 20.2 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 9,044 - Forks: 1,202

Self-host the powerful Dia TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), support for SafeTensors/BF16, voice cloning, dialogue generation, and GPU/CPU execution.

Language: Python - Size: 571 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

denizsafak/abogen

Generate audiobooks from EPUBs, PDFs and text with synchronized captions.

Language: Python - Size: 2.09 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 322 - Forks: 17

crispinprojects/talkcalendar

Talk Calendar is a personal desktop calendar for Linux which has some speech capability.

Language: C - Size: 234 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

leon-ai/leon

🧠 Leon is your open-source personal assistant.

Language: TypeScript - Size: 21.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 16,397 - Forks: 1,362

sine2pi/asr_model

NLP model with acoustic positional encoding.

Language: Python - Size: 638 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

huggingface/speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o

Language: Python - Size: 299 KB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 4,076 - Forks: 462

Swap98-Coder/mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

lmnt-com/diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.

Language: Python - Size: 20.5 KB - Last synced at: about 5 hours ago - Pushed at: about 1 year ago - Stars: 844 - Forks: 119

Lyrcaxis/KokoroSharp

Fast local TTS inference engine in C# with ONNX runtime. Multi-speaker, multi-platform and multilingual. Integrate on your .NET projects using a plug-and-play NuGet package, complete with all voices.

Language: C# - Size: 107 KB - Last synced at: 2 days ago - Pushed at: 16 days ago - Stars: 127 - Forks: 9

NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Language: Jupyter Notebook - Size: 104 MB - Last synced at: 3 days ago - Pushed at: 11 months ago - Stars: 14,346 - Forks: 3,349

ELANOELR/EchoForge-AI-Voice-Cloner-GUI

Offline AI voice cloning tool with real-time TTS GUI. No login. No GPU required. Perfect for content creators.

Language: Python - Size: 834 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 5 - Forks: 1

fabiolimace/espeak-ng-playground

Espaço para experimentação e desenvolvimento de melhorias para o `espeak-ng` focado no português brasileiro. Repositório principal: https://github.com/fabiolimace/espeak-ng/

Language: Awk - Size: 71.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

NVIDIA/BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Language: Python - Size: 19.9 MB - Last synced at: 3 days ago - Pushed at: 10 months ago - Stars: 1,047 - Forks: 133

jim11662418/General_Instrument_CTS256_SP0256_Speech_Synthesizer

Vintage General Instrument Speech Synthesizer CTS256 with SP0256

Language: Assembly - Size: 15.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 9 - Forks: 2

mkiol/dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.

Language: C++ - Size: 76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 937 - Forks: 39

Avatar-Home-Automation/A.V.A.T.A.R-Server

Agnostic Virtual Assistant for The Automated Residences

Language: JavaScript - Size: 22.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

SocAIty/socaity

SDK for generative AI.

Language: Python - Size: 26.2 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

sanushka2025/Microsoft_Windows

Programs and tools for Windows.

Language: Python - Size: 9.42 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

echogarden-project/echogarden

Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.

Language: TypeScript - Size: 2.4 MB - Last synced at: 3 days ago - Pushed at: 28 days ago - Stars: 373 - Forks: 40

rany2/edge-tts

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Language: Python - Size: 2.08 MB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 8,485 - Forks: 793

RHVoice/RHVoice

a free and open source speech synthesizer for Russian and other languages

Language: C++ - Size: 14.3 MB - Last synced at: about 20 hours ago - Pushed at: 10 days ago - Stars: 1,662 - Forks: 245

Samba250/Mars

Explore Mars, the fourth planet from the Sun, known for its reddish surface and intriguing geological features. 🚀 Join the mission to uncover its secrets and pave the way for future human exploration! 🌌

Size: 19.3 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

keithito/tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)

Language: Python - Size: 110 KB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 2,978 - Forks: 956

ManimCommunity/manim-voiceover

Manim plugin for all things voiceover

Language: Python - Size: 879 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 226 - Forks: 55

NaomiProject/Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!

Language: Python - Size: 5.27 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 278 - Forks: 60

WhisperSpeech/WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Language: Jupyter Notebook - Size: 38 MB - Last synced at: 6 days ago - Pushed at: 17 days ago - Stars: 4,286 - Forks: 240

sasawasewq/DubFlow

DubFlow is an AI tool that transforms YouTube videos into multiple languages, making content accessible to a wider audience. With features like automatic transcript extraction and natural-sounding speech generation, it simplifies the dubbing process for creators. 🐙✨

Language: JavaScript - Size: 3.03 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python - Size: 69.2 MB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 11,996 - Forks: 1,920

ssb22/CedPane

Chinese-English Dictionary Public-domain Additions for Names Etc (CedPane)

Size: 35.8 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 1

Blaizzy/mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Language: Python - Size: 87.4 MB - Last synced at: 6 days ago - Pushed at: 15 days ago - Stars: 2,401 - Forks: 176

EveryVoiceTTS/EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language

Language: Python - Size: 9.25 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 35 - Forks: 2

r9y9/pysptk

A python wrapper for Speech Signal Processing Toolkit (SPTK).

Language: Python - Size: 15.3 MB - Last synced at: about 11 hours ago - Pushed at: 11 months ago - Stars: 442 - Forks: 78

gexgd0419/NaturalVoiceSAPIAdapter

Make Azure natural TTS voices accessible to any SAPI 5-compatible application.

Language: C++ - Size: 27 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 366 - Forks: 22

SocAIty/SpeechCraft Fork of suno-ai/bark

🔊 Text2Speech, Voice-Cloning and Voice2Voice conversion with the text-prompted generative audio model bark

Language: Python - Size: 9.78 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 65 - Forks: 6

rhasspy/piper

A fast, local neural text to speech system

Language: C++ - Size: 208 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 9,351 - Forks: 734

KoljaB/RealtimeTTS

Converts text to speech in realtime

Language: Python - Size: 68.1 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 3,186 - Forks: 316

DmitryRyumin/INTERSPEECH-2023-24-Papers

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

Size: 11.4 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 674 - Forks: 42

alexykn/TorchTS

A modern text to speech frontend for Kokoro-82M

Language: JavaScript - Size: 4.13 MB - Last synced at: 1 day ago - Pushed at: 8 days ago - Stars: 5 - Forks: 2

gabrielmittag/NISQA

NISQA - Non-Intrusive Speech Quality and TTS Naturalness Assessment

Language: Python - Size: 2.2 MB - Last synced at: 7 days ago - Pushed at: 7 months ago - Stars: 802 - Forks: 132

thorstenMueller/Thorsten-Voice

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.

Language: Python - Size: 16.6 MB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 619 - Forks: 53

stakira/OpenUtau

Open singing synthesis platform / Open source UTAU successor

Language: C# - Size: 77.6 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2,857 - Forks: 361

microsoft/SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language: Python - Size: 17.8 MB - Last synced at: 5 days ago - Pushed at: about 1 year ago - Stars: 1,369 - Forks: 127

ictnlp/Stream-Omni

Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Language: Python - Size: 10.6 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 1 - Forks: 0

espnet/espnet

End-to-End Speech Processing Toolkit

Language: Python - Size: 1.15 GB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 9,202 - Forks: 2,278

leaonline/easy-speech

🔊 Cross browser Speech Synthesis also known as Text to speech or TTS; no dependencies; uses Web Speech API

Language: JavaScript - Size: 1.12 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 234 - Forks: 24

SanHacks/AiGen

Multi Model Personal Assistant Wrapper in Go: Interact with ChatGPT, Claude or Ollama Cross Platform (Speech & Image generation supported)

Language: Go - Size: 3.41 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 4

leminhnguyen/ai-speech-engineer-roadmap

A curated roadmap based on my 5 years of experience form zero to become a skilled AI Speech Engineer. This roadmap covers everything from fundamentals to cutting-edge research trends in the speech domain.

Size: 4.35 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 14 - Forks: 0

modelscope/FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Language: Python - Size: 1.46 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 409 - Forks: 32

tensorflow/lingvo

Lingvo

Language: Python - Size: 142 MB - Last synced at: 4 days ago - Pushed at: 6 days ago - Stars: 2,843 - Forks: 449

sdkcarlos/artyom.js

A voice control - voice commands - speech recognition and speech synthesis javascript library. Create your own siri,google now or cortana with Google Chrome within your website.

Language: JavaScript - Size: 1.08 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 1,261 - Forks: 366

ssb22/gradint

Graduated Interval Recall program

Language: Python - Size: 59.4 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 20 - Forks: 4

KennethanCeyer/awesome-audio-speech

Awesome list of Audio, Speech, and DSP(Digital signal processing)

Size: 847 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 1

zzw922cn/awesome-speech-recognition-speech-synthesis-papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Size: 197 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 3,048 - Forks: 513

Migushthe2nd/MsEdgeTTS

A simple Azure Speech Service module that uses the Microsoft Edge Read Aloud API

Language: TypeScript - Size: 265 KB - Last synced at: 11 days ago - Pushed at: 6 months ago - Stars: 306 - Forks: 45

IEvangelist/learning-blazor

The application for the "Learning Blazor: Build Single Page Apps with WebAssembly and C#" O'Reilly Media book by David Pine.

Language: C# - Size: 7.47 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 135 - Forks: 41

NVIDIA/flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer

Language: Jupyter Notebook - Size: 2.76 MB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 899 - Forks: 176

Vaibhavs10/ml-with-audio

HF's ML for Audio study group

Language: Jupyter Notebook - Size: 5.12 MB - Last synced at: 5 days ago - Pushed at: over 2 years ago - Stars: 192 - Forks: 29

JackismyShephard/ultimate-rvc

An app for creating audio-based content such as song covers and speech using Retrieval-based Voice Conversion.

Language: Python - Size: 7.65 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 114 - Forks: 24

rhasspy/piper-samples

Samples for Piper text to speech system

Language: Python - Size: 573 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 6 - Forks: 3

kakaobrain/pororo 📦

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Language: Python - Size: 12.8 MB - Last synced at: 11 days ago - Pushed at: over 3 years ago - Stars: 1,297 - Forks: 223

DigitalPhonetics/IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Language: Python - Size: 21.4 MB - Last synced at: 11 days ago - Pushed at: about 1 month ago - Stars: 1,611 - Forks: 183

kosich/rxjs-tts

RxJS wrapper for Text-to-Speech Web API

Language: TypeScript - Size: 563 KB - Last synced at: 9 days ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 3

mikeroyal/NLP-Guide

Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.

Language: Python - Size: 315 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 93 - Forks: 15

Shristirajpoot/CalcVoive

🎙️ Voice-enabled calculator built with React | Supports speech input/output & smart math parsing

Language: CSS - Size: 1.17 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

ryota-komatsu/speech_resynth

Speech Resynthesis and Language Modeling

Language: Python - Size: 4.86 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 17 - Forks: 4

alphacep/awesome-russian-speech

Russian speech technology links

Size: 134 KB - Last synced at: 14 days ago - Pushed at: about 1 month ago - Stars: 309 - Forks: 22

michaelzhang-ai/Text2Video

ICASSP 2022: "Text2Video: text-driven talking-head video synthesis with phonetic dictionary".

Language: Python - Size: 209 MB - Last synced at: 11 days ago - Pushed at: about 2 years ago - Stars: 436 - Forks: 94

csun22/Synthetic-Voice-Detection-Vocoder-Artifacts

This repository is related to our Dataset and Detection code from the paper: AI-Synthesized Voice Detection Using Neural Vocoder Artifacts accepted in CVPR Workshop on Media Forensic 2023.

Language: Python - Size: 183 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 121 - Forks: 14

estuary-ai/mangrove

Mangrove is the backend module of Estuary, a framework for building multimodal real-time Socially Intelligent Agents (SIAs).

Language: Python - Size: 2.16 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 11 - Forks: 2

yukiarimo/hanasu

Hanasu is a human-like TTS model based on the multilingual Himitsu V1 transformer-based encoder and VITS architecture

Language: Python - Size: 5.58 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 28 - Forks: 5

ZYiHu/EmoVoiceChatbot

Emotional voice chatbot with sentiment-based speech synthesis

Language: Python - Size: 21.5 KB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 1 - Forks: 0

oscie57/tiktok-voice

Simple Python script to interact with the TikTok TTS API

Language: Python - Size: 1.24 MB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 584 - Forks: 86

Steve0929/tiktok-tts

Provides a simple way to generate text-to-speech audio files using TikTok's text-to-speech (TTS) API in Node.js.

Language: JavaScript - Size: 628 KB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 90 - Forks: 9

devnen/Chatterbox-TTS-Server

Self-host the powerful Chatterbox TTS model. This server offers a user-friendly Web UI, flexible API endpoints (incl. OpenAI compatible), predefined voices, voice cloning, and large audiobook-scale text processing. Runs accelerated on NVIDIA (CUDA), AMD (ROCm), and CPU.

Language: Python - Size: 18.5 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 212 - Forks: 36

Badri467/DubFlow

DubFlow lets you effortlessly dub YouTube videos into any language with high-quality translations and synced audio. Simply enter a YouTube URL, choose your target language, and get a dubbed video ready to share. Perfect for creators and viewers looking to break language barriers.

Language: JavaScript - Size: 120 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0