GitHub topics: speech-recognition
MeowMeowSE3/language-detection-ai
Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.
Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 0 - Forks: 1

ShakeelKhalid1913/SignTalkApp
Language: Dart - Size: 48.7 MB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 3 - Forks: 0

aniemore/Aniemore
Emotions recognition from audio and text files (only russian language)
Language: Python - Size: 2.11 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 71 - Forks: 8

z430/keyword-spotting
Implementation of keyword spotting or wake up word
Language: Python - Size: 210 KB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 1 - Forks: 0

MadhupriyaAdapa/Voice-Assistant
Jarvis is a Python-based voice assistant that responds to voice commands to perform tasks like web browsing, fetching information, managing system operations, and more. It uses speech recognition and text-to-speech to interact naturally with the user.
Language: Python - Size: 2.93 KB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

CentralLabFacilities/speech_recognition
Language: Python - Size: 149 KB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 1

huggingface/transformers
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Language: Python - Size: 307 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 145,965 - Forks: 29,434

pradipchaudhary/ainepal
🎓 Learn | 📢 Aware | 💡 Inform – Helping Nepal Live Smarter
Language: TypeScript - Size: 1.37 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 2 - Forks: 1

sine2pi/Maxfactor
"Remember kids there are three ways to do things, the right way, the wrong way, and the Max Powers way"
Language: Python - Size: 221 KB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

Konjevod1/OtosakuStreamingASR-iOS
OtosakuStreamingASR-iOS offers a simple way to integrate real-time speech recognition into your iOS apps. With its efficient on-device processing, you can enhance user experiences without relying on internet connectivity. 🐙✨
Language: Swift - Size: 1.64 MB - Last synced at: about 10 hours ago - Pushed at: about 11 hours ago - Stars: 0 - Forks: 0

JackySDEC/Speech-to-Text_Traditional_Method
Explore a speech recognition system using HMM and GMM techniques. Classify spoken words from audio with feature extraction methods. 🌐🎤
Language: Jupyter Notebook - Size: 524 KB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 0 - Forks: 0

asifalam6310/python-basics-practice
# python-basics-practice This repository showcases my Python learning journey through CodeWithHarry tutorials. Each file addresses fundamental topics, helping others learn alongside me. 🐍📚
Language: Python - Size: 16.6 KB - Last synced at: about 13 hours ago - Pushed at: about 15 hours ago - Stars: 0 - Forks: 0

linto-ai/whisper-timestamped
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
Language: Python - Size: 4.49 MB - Last synced at: about 12 hours ago - Pushed at: 3 months ago - Stars: 2,460 - Forks: 189

saharmor/whisper-playground
Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/
Language: Python - Size: 407 KB - Last synced at: about 19 hours ago - Pushed at: about 1 year ago - Stars: 815 - Forks: 140

arkam-ahamed/AI-Tutor
AI English Tutor - Voice-powered conversational English practice app with real-time speech recognition, AI feedback, and text-to-speech. Built with React frontend and Spring Boot backend using Google Gemini AI.
Language: JavaScript - Size: 0 Bytes - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

FL33TW00D/whisper-turbo
Cross-Platform, GPU Accelerated Whisper 🏎️
Language: TypeScript - Size: 3.83 MB - Last synced at: about 17 hours ago - Pushed at: over 1 year ago - Stars: 1,801 - Forks: 82

mbjmgjgjgj/Hogwarts-Legacy-Voice-Spellcaster
# 🎙️ Hogwarts Legacy Voice Spellcaster## 🧙♂️ Описание проектаУправление магией голосом для игры `Hogwarts Legacy`. Этот проект позволяет управлять заклинаниями в игре при помощи голоса, используя Python и Vosk для распознавания речи.
Language: Python - Size: 55.6 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

KChantal/SignBridge
Building Bridges for Inclusive Communication
Language: TypeScript - Size: 262 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

default741/speak-sense
AI-powered tool for detecting spoken languages from audio using machine learning, deep learning, and signal processing techniques.
Language: Jupyter Notebook - Size: 285 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

heypoom/archival-intelligences
Archival Intelligences. (TBA)
Language: TypeScript - Size: 5.74 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

leon-ai/leon
🧠 Leon is your open-source personal assistant.
Language: TypeScript - Size: 21.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 16,397 - Forks: 1,362

Picovoice/porcupine
On-device wake word detection powered by deep learning
Language: Python - Size: 339 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 4,195 - Forks: 533

Olney1/ChatGPT-OpenAI-Smart-Speaker
This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.
Language: Python - Size: 145 MB - Last synced at: about 4 hours ago - Pushed at: 7 months ago - Stars: 293 - Forks: 33

NamelessWonderer0/Chat-app
Real-time chat app built with the MERN stack and Socket.io for instant messaging. Connect and chat with multiple users seamlessly! 🐙💻
Language: JavaScript - Size: 6.32 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

sine2pi/asr_model
NLP model with acoustic positional encoding.
Language: Python - Size: 638 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

newcomerxD/server-gen
Server-Gen is a versatile tool that helps you monitor system metrics and sends regular reports via email. With features like cron-style scheduling and a built-in health check endpoint, it ensures your server stays in top shape. 🐙✨
Language: Go - Size: 39.1 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

selmonix/OtosakuKWS-iOS
OtosakuKWS is a lightweight, on-device keyword spotting engine for iOS that detects speech commands in real time. This project relies on a CRNN CoreML model for efficient and accurate voice command recognition. 🐙🌟
Language: Swift - Size: 1.24 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

proger/uk
Фонограми та синтагми: інструменти обробки
Language: Python - Size: 8.11 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 21 - Forks: 0

Swap98-Coder/mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Size: 1.95 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

alan-ai/alan-sdk-reactnative
The Self-Coding System for Your App — Alan AI SDK for React Native
Language: Ruby - Size: 188 MB - Last synced at: about 3 hours ago - Pushed at: 2 months ago - Stars: 587 - Forks: 16

NVIDIA/DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Language: Jupyter Notebook - Size: 104 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 14,346 - Forks: 3,349

nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
Language: Python - Size: 7.77 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 8,153 - Forks: 1,911

rakib-0/SubtitleGenerator
SubtitleGenerator is an interactive tool that automatically generates and translates subtitles for your videos using AI. It supports multiple languages and formats, making it easy to enhance your video content. 🛠️✨
Language: Python - Size: 27.3 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 1

k2-fsa/sherpa-ncnn
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
Language: C++ - Size: 2.08 MB - Last synced at: about 5 hours ago - Pushed at: 24 days ago - Stars: 1,369 - Forks: 185

SahilAggarwal2004/react-text-to-speech
An easy-to-use React.js component that leverages the Web Speech API to convert text to speech.
Language: TypeScript - Size: 2.44 MB - Last synced at: about 14 hours ago - Pushed at: 4 days ago - Stars: 64 - Forks: 5

mkiol/dsnote
Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Language: C++ - Size: 76 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 937 - Forks: 39

Brooklyn-Dev/Ultron-AI
Voice-controlled AI gaming assistant for Marvel Rivals.
Language: Python - Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

Avatar-Home-Automation/A.V.A.T.A.R-Server
Agnostic Virtual Assistant for The Automated Residences
Language: JavaScript - Size: 22.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

ganeshguntaka5/Veronica_Chatbot
🤖 AI Chatbot with Voice Interface - A Flask web app featuring Groq-powered chat, voice input/output, and theme support. Combines natural language processing with speech synthesis for an interactive chat experience. #Python #Flask #AI #VoiceInterface
Language: HTML - Size: 12.7 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

BrunoHenrique00/ear
Ear is a desktop app that will help you transcribe what is playing on your computer!
Language: TypeScript - Size: 6.27 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 3

misyaguziya/VRCT
VRCT(VRChat Chatbox Translator & Transcription)
Language: JavaScript - Size: 59.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 195 - Forks: 19

echogarden-project/echogarden
Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.
Language: TypeScript - Size: 2.4 MB - Last synced at: about 20 hours ago - Pushed at: 26 days ago - Stars: 373 - Forks: 40

Sharrnah/whispering-ui
Native UI for the Whispering Tiger project - https://github.com/Sharrnah/whispering (live transcription / translation)
Language: Go - Size: 40.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 271 - Forks: 14

sandrohanea/whisper.net
Whisper.net. Speech to text made simple using Whisper Models
Language: C# - Size: 59.2 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 754 - Forks: 109

fnfurrcann/any-listen
A cross-platform private song playback service.
Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

huggingface/distil-whisper
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Language: Python - Size: 2.57 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 3,887 - Forks: 331

argmaxinc/WhisperKit
On-device Speech Recognition for Apple Silicon
Language: Swift - Size: 2.49 MB - Last synced at: 2 days ago - Pushed at: 11 days ago - Stars: 4,713 - Forks: 410

matthiasn/lotti
Achieve your goals and keep your data private with Lotti. This life tracking app is designed to help you stay motivated and on track, all while keeping your personal information safe and secure. Now with on-device speech recognition.
Language: Dart - Size: 45 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 517 - Forks: 53

alphacep/vosk-api
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 12,364 - Forks: 1,472

toverainc/willow-inference-server
Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS
Language: Python - Size: 3.27 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 455 - Forks: 48

HeyWillow/willow
Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative
Language: C - Size: 1.87 MB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 2,810 - Forks: 105

k2-fsa/sherpa
Speech-to-text server framework with next-gen Kaldi
Language: C++ - Size: 261 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 710 - Forks: 124

bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
Size: 9.34 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,260 - Forks: 101

VATHSAN08/Mental-Health-Sentiment-Analysis-using-Deep-Learning
# Mental Health Sentiment Analysis using Deep LearningThis project leverages deep learning to classify mental health-related sentiments from text into seven categories: Anxiety, Bipolar, Depression, Normal, Personality Disorder, Stress, and Suicidal. By utilizing advanced NLP techniques, we aim to enhance understanding and support for mental well
Language: Jupyter Notebook - Size: 4.12 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API
⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。
Language: Python - Size: 1.22 MB - Last synced at: about 18 hours ago - Pushed at: 6 days ago - Stars: 383 - Forks: 47

TensorSpeech/TensorFlowASR
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Language: Python - Size: 90.3 MB - Last synced at: 1 day ago - Pushed at: 12 days ago - Stars: 983 - Forks: 242

gmovernight/COS760-ASR-Topic-Modeling-Group1
Optimizing Topic Modeling from Speech: Evaluating ASR and Topic Model Combinations for Setswana Podcasts
Language: Jupyter Notebook - Size: 1.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

exPHAT/SwiftWhisper
🎤 The easiest way to transcribe audio in Swift
Language: Swift - Size: 720 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 696 - Forks: 91

Avinraj01/SHL-Grammar-Scoring-Engine-for-Voice-Samples
This model predicts grammar scores (1–5) from audio files. It uses Whisper to transcribe speech to text, cleans the text, and extracts features with TF-IDF. A Random Forest Regressor is trained to learn grammar score patterns. Evaluation via Pearson Correlation showed good results.
Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

eellak/gsoc2019-sphinx
Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training
Language: Python - Size: 2.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 19 - Forks: 2

ceuk/speech-recognition-aws-polyfill
Polyfill for the SpeechRecognition browser API using AWS Transcribe as a fallback
Language: TypeScript - Size: 686 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 10 - Forks: 9

mende237/Nda-Nda-Force-Aligner
Forced alignment of Nda‘ Nda’ a Cameroonian language
Language: Shell - Size: 727 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

linagora-labs/asr_benchmark
Toolkit to benchmark various speech recognition APIs (NeMo, Whisper...) and visualize the results
Language: Jupyter Notebook - Size: 2.45 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

luisKING2008/Stream-Omni
Stream-Omni enables seamless interactions across text, vision, and speech using a large language model. This repository includes the model, datasets, and tools for developers to explore multimodal capabilities. 🌟🌐
Language: Python - Size: 10.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

NaomiProject/Naomi
The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!
Language: Python - Size: 5.27 MB - Last synced at: about 17 hours ago - Pushed at: 5 months ago - Stars: 278 - Forks: 60

uhlive/python-sdk
Official Python SDK for uh!ive's Speech Recognition APIs
Language: Python - Size: 573 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 1

gtreshchev/RuntimeSpeechRecognizer 📦
Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.
Language: C++ - Size: 24.8 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 297 - Forks: 46

RimAmarat/SpeechCommands
Speech Commands Recognition with very deep 1D CNN (M5 Architecture)
Language: Python - Size: 84 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

PaddlePaddle/PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language: Python - Size: 69.2 MB - Last synced at: 5 days ago - Pushed at: 13 days ago - Stars: 11,996 - Forks: 1,920

DanteVela/Voice-Assistant-Python
This repository implements the GeeksforGeeks “Voice Assistant using Python” tutorial, showcasing a speech-driven virtual assistant powered by Speech Recognition for voice input, pyttsx3 for text-to-speech, Pywhatkit and webbrowser for web tasks, plus Wikipedia lookups and programmer jokes via pyjokes.
Language: Python - Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Blaizzy/mlx-audio
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Language: Python - Size: 87.4 MB - Last synced at: 5 days ago - Pushed at: 13 days ago - Stars: 2,401 - Forks: 176

RimAmarat/RealTimeSpeechRec
Real Time Speech Recognition with Voice Activity Detection using Pytorch
Language: Python - Size: 13.7 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

speechmatics/speechmatics-python
Python library and CLI for Speechmatics
Language: Python - Size: 2.97 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 74 - Forks: 21

tosinonikute/NotelyVoice
A Notes + AI Voice to Text Audio Transcription App built with Compose Multiplatform Android & iOS using Whisper AI, MVVM and Clean architecture, Jetpack Compose, Material3, Dagger hilt, SQLDelight, Coroutines and Flow
Language: C++ - Size: 76.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 2

miekadal4/top-coder-challenge
Reverse-engineer a 60-year-old travel reimbursement system using historical data and employee interviews. Your mission is to replicate its behavior based on 1,000 examples and uncover the original business logic. 🛠️💻
Language: Shell - Size: 75.2 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

openvinotoolkit/openvino
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
Language: C++ - Size: 850 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 8,445 - Forks: 2,631

hyeonsangjeon/computing-Korean-STT-error-rates
STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지
Language: Python - Size: 125 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 63 - Forks: 10

pragmatrix/context-switch
Audio Streaming for FreeSWITCH with backends powered by Azure, OpenAI, and Aristech
Language: Rust - Size: 428 KB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 1

lhotse-speech/lhotse
Tools for handling multimodal data in machine learning projects.
Language: Python - Size: 31.6 MB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 1,028 - Forks: 235

dudarev/speechdown
CLI tool to transcribe your spoken audio notes into timestamped, multilingual Markdown—offline, accurate, and feedback-driven.
Language: Python - Size: 341 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

techAli1996/wakeword
ESP32S3 Wakeword/Keyword Spotting starter project with ready to go ML model
Language: C - Size: 4.68 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

mutablelogic/go-whisper
Speech-to-Text in golang
Language: Go - Size: 8.17 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 114 - Forks: 11

ashishpatel26/Treasure-of-Transformers
💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️
Language: Jupyter Notebook - Size: 370 KB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 1,001 - Forks: 213

m-bain/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Language: Python - Size: 38.7 MB - Last synced at: 5 days ago - Pushed at: 15 days ago - Stars: 16,296 - Forks: 1,749

edenai/edenai-apis
Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines
Language: Python - Size: 158 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 449 - Forks: 67

DmitryRyumin/INTERSPEECH-2023-24-Papers
INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!
Size: 11.4 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 674 - Forks: 42

Saik0s/Whisperboard
The open-source iOS app that's making quality voice transcription more accessible on mobile devices.
Language: Swift - Size: 179 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 870 - Forks: 88

03-JS/PySpeech
Automatic speech recognition API for Lethal Company
Language: C# - Size: 26.4 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

deepgram/deepgram-dotnet-sdk
Official .NET SDK for Deepgram.
Language: C# - Size: 6.75 MB - Last synced at: 5 days ago - Pushed at: 10 days ago - Stars: 42 - Forks: 34

baomeomeo/speech
A Speech-To-Text (with translation) library for Go; currently uses Whisper (runs locally if needed; no need in any API keys)
Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

kaka-lin/ASR-notes
A practical collection of ASR models and tools — including Whisper variants and Google STT — with implementations for real-time, batch transcription, and multi-platform integration.
Language: Python - Size: 311 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

lobehub/lobe-tts
🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser
Language: TypeScript - Size: 390 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 627 - Forks: 78

flashlight/wav2letter
Facebook AI Research's Automatic Speech Recognition Toolkit
Language: C++ - Size: 6.2 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 6,432 - Forks: 1,010

ritikpandey01/Jarvis-X
🔍 Jarvis is a personal AI assistant that brings together conversation, automation, and creativity in one place. It can chat intelligently, search the web in real-time, generate images, control your system, and respond to voice commands - all through natural language interaction.
Language: Python - Size: 63.5 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

ggml-org/whisper.cpp
Port of OpenAI's Whisper model in C/C++
Language: C++ - Size: 23 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 40,827 - Forks: 4,355

Kaljurand/K6nele
An Android app that offers speech-to-text user interfaces to other apps
Language: Java - Size: 24.5 MB - Last synced at: 3 days ago - Pushed at: 8 days ago - Stars: 282 - Forks: 82

thewh1teagle/sherpa-rs
Rust bindings to https://github.com/k2-fsa/sherpa-onnx
Language: Rust - Size: 1.49 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 188 - Forks: 28

mozilla/DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
Language: C++ - Size: 48.2 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 26,444 - Forks: 4,055

microsoft/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language: Python - Size: 17.8 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1,369 - Forks: 127

ictnlp/Stream-Omni
Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.
Language: Python - Size: 10.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0
