An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: speech-recognition

MeowMeowSE3/language-detection-ai

Detect 18+ languages instantly using machine learning (BERT, LSTM, SVM) and NLP. Includes a Flask web app for real-time predictions, trained models, and detailed notebooks.

Language: Jupyter Notebook - Size: 1.76 MB - Last synced at: about 2 hours ago - Pushed at: about 3 hours ago - Stars: 0 - Forks: 1

ShakeelKhalid1913/SignTalkApp

Language: Dart - Size: 48.7 MB - Last synced at: about 3 hours ago - Pushed at: about 5 hours ago - Stars: 3 - Forks: 0

aniemore/Aniemore

Emotions recognition from audio and text files (only russian language)

Language: Python - Size: 2.11 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 71 - Forks: 8

z430/keyword-spotting

Implementation of keyword spotting or wake up word

Language: Python - Size: 210 KB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 1 - Forks: 0

MadhupriyaAdapa/Voice-Assistant

Jarvis is a Python-based voice assistant that responds to voice commands to perform tasks like web browsing, fetching information, managing system operations, and more. It uses speech recognition and text-to-speech to interact naturally with the user.

Language: Python - Size: 2.93 KB - Last synced at: about 5 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

CentralLabFacilities/speech_recognition

Language: Python - Size: 149 KB - Last synced at: about 7 hours ago - Pushed at: about 8 hours ago - Stars: 0 - Forks: 1

huggingface/transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Language: Python - Size: 307 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 145,965 - Forks: 29,434

pradipchaudhary/ainepal

🎓 Learn | 📢 Aware | 💡 Inform – Helping Nepal Live Smarter

Language: TypeScript - Size: 1.37 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 2 - Forks: 1

sine2pi/Maxfactor

"Remember kids there are three ways to do things, the right way, the wrong way, and the Max Powers way"

Language: Python - Size: 221 KB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 0 - Forks: 0

Konjevod1/OtosakuStreamingASR-iOS

OtosakuStreamingASR-iOS offers a simple way to integrate real-time speech recognition into your iOS apps. With its efficient on-device processing, you can enhance user experiences without relying on internet connectivity. 🐙✨

Language: Swift - Size: 1.64 MB - Last synced at: about 10 hours ago - Pushed at: about 11 hours ago - Stars: 0 - Forks: 0

JackySDEC/Speech-to-Text_Traditional_Method

Explore a speech recognition system using HMM and GMM techniques. Classify spoken words from audio with feature extraction methods. 🌐🎤

Language: Jupyter Notebook - Size: 524 KB - Last synced at: about 13 hours ago - Pushed at: about 14 hours ago - Stars: 0 - Forks: 0

asifalam6310/python-basics-practice

# python-basics-practice This repository showcases my Python learning journey through CodeWithHarry tutorials. Each file addresses fundamental topics, helping others learn alongside me. 🐍📚

Language: Python - Size: 16.6 KB - Last synced at: about 13 hours ago - Pushed at: about 15 hours ago - Stars: 0 - Forks: 0

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language: Python - Size: 4.49 MB - Last synced at: about 12 hours ago - Pushed at: 3 months ago - Stars: 2,460 - Forks: 189

saharmor/whisper-playground

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

Language: Python - Size: 407 KB - Last synced at: about 19 hours ago - Pushed at: about 1 year ago - Stars: 815 - Forks: 140

arkam-ahamed/AI-Tutor

AI English Tutor - Voice-powered conversational English practice app with real-time speech recognition, AI feedback, and text-to-speech. Built with React frontend and Spring Boot backend using Google Gemini AI.

Language: JavaScript - Size: 0 Bytes - Last synced at: about 22 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

FL33TW00D/whisper-turbo

Cross-Platform, GPU Accelerated Whisper 🏎️

Language: TypeScript - Size: 3.83 MB - Last synced at: about 17 hours ago - Pushed at: over 1 year ago - Stars: 1,801 - Forks: 82

mbjmgjgjgj/Hogwarts-Legacy-Voice-Spellcaster

# 🎙️ Hogwarts Legacy Voice Spellcaster## 🧙♂️ Описание проектаУправление магией голосом для игры `Hogwarts Legacy`. Этот проект позволяет управлять заклинаниями в игре при помощи голоса, используя Python и Vosk для распознавания речи.

Language: Python - Size: 55.6 MB - Last synced at: about 23 hours ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

KChantal/SignBridge

Building Bridges for Inclusive Communication

Language: TypeScript - Size: 262 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 2 - Forks: 0

default741/speak-sense

AI-powered tool for detecting spoken languages from audio using machine learning, deep learning, and signal processing techniques.

Language: Jupyter Notebook - Size: 285 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

heypoom/archival-intelligences

Archival Intelligences. (TBA)

Language: TypeScript - Size: 5.74 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 1

leon-ai/leon

🧠 Leon is your open-source personal assistant.

Language: TypeScript - Size: 21.3 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 16,397 - Forks: 1,362

Picovoice/porcupine

On-device wake word detection powered by deep learning

Language: Python - Size: 339 MB - Last synced at: 1 day ago - Pushed at: 3 days ago - Stars: 4,195 - Forks: 533

Olney1/ChatGPT-OpenAI-Smart-Speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

Language: Python - Size: 145 MB - Last synced at: about 4 hours ago - Pushed at: 7 months ago - Stars: 293 - Forks: 33

NamelessWonderer0/Chat-app

Real-time chat app built with the MERN stack and Socket.io for instant messaging. Connect and chat with multiple users seamlessly! 🐙💻

Language: JavaScript - Size: 6.32 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

sine2pi/asr_model

NLP model with acoustic positional encoding.

Language: Python - Size: 638 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

newcomerxD/server-gen

Server-Gen is a versatile tool that helps you monitor system metrics and sends regular reports via email. With features like cron-style scheduling and a built-in health check endpoint, it ensures your server stays in top shape. 🐙✨

Language: Go - Size: 39.1 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

selmonix/OtosakuKWS-iOS

OtosakuKWS is a lightweight, on-device keyword spotting engine for iOS that detects speech commands in real time. This project relies on a CRNN CoreML model for efficient and accurate voice command recognition. 🐙🌟

Language: Swift - Size: 1.24 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

proger/uk

Фонограми та синтагми: інструменти обробки

Language: Python - Size: 8.11 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 21 - Forks: 0

Swap98-Coder/mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Size: 1.95 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 2 - Forks: 0

alan-ai/alan-sdk-reactnative

The Self-Coding System for Your App — Alan AI SDK for React Native

Language: Ruby - Size: 188 MB - Last synced at: about 3 hours ago - Pushed at: 2 months ago - Stars: 587 - Forks: 16

NVIDIA/DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

Language: Jupyter Notebook - Size: 104 MB - Last synced at: 1 day ago - Pushed at: 11 months ago - Stars: 14,346 - Forks: 3,349

nl8590687/ASRT_SpeechRecognition

A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统

Language: Python - Size: 7.77 MB - Last synced at: 1 day ago - Pushed at: 9 months ago - Stars: 8,153 - Forks: 1,911

rakib-0/SubtitleGenerator

SubtitleGenerator is an interactive tool that automatically generates and translates subtitles for your videos using AI. It supports multiple languages and formats, making it easy to enhance your video content. 🛠️✨

Language: Python - Size: 27.3 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 1

k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

Language: C++ - Size: 2.08 MB - Last synced at: about 5 hours ago - Pushed at: 24 days ago - Stars: 1,369 - Forks: 185

SahilAggarwal2004/react-text-to-speech

An easy-to-use React.js component that leverages the Web Speech API to convert text to speech.

Language: TypeScript - Size: 2.44 MB - Last synced at: about 14 hours ago - Pushed at: 4 days ago - Stars: 64 - Forks: 5

mkiol/dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.

Language: C++ - Size: 76 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 937 - Forks: 39

Brooklyn-Dev/Ultron-AI

Voice-controlled AI gaming assistant for Marvel Rivals.

Language: Python - Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 1 - Forks: 0

Avatar-Home-Automation/A.V.A.T.A.R-Server

Agnostic Virtual Assistant for The Automated Residences

Language: JavaScript - Size: 22.2 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

ganeshguntaka5/Veronica_Chatbot

🤖 AI Chatbot with Voice Interface - A Flask web app featuring Groq-powered chat, voice input/output, and theme support. Combines natural language processing with speech synthesis for an interactive chat experience. #Python #Flask #AI #VoiceInterface

Language: HTML - Size: 12.7 KB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

BrunoHenrique00/ear

Ear is a desktop app that will help you transcribe what is playing on your computer!

Language: TypeScript - Size: 6.27 MB - Last synced at: 1 day ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 3

misyaguziya/VRCT

VRCT(VRChat Chatbox Translator & Transcription)

Language: JavaScript - Size: 59.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 195 - Forks: 19

echogarden-project/echogarden

Cross-platform speech toolset, used from the command-line or as a Node.js library. Includes a variety of engines for speech synthesis, speech recognition, forced alignment, speech translation, voice isolation, language detection and more.

Language: TypeScript - Size: 2.4 MB - Last synced at: about 20 hours ago - Pushed at: 26 days ago - Stars: 373 - Forks: 40

Sharrnah/whispering-ui

Native UI for the Whispering Tiger project - https://github.com/Sharrnah/whispering (live transcription / translation)

Language: Go - Size: 40.7 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 271 - Forks: 14

sandrohanea/whisper.net

Whisper.net. Speech to text made simple using Whisper Models

Language: C# - Size: 59.2 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 754 - Forks: 109

fnfurrcann/any-listen

A cross-platform private song playback service.

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

huggingface/distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

Language: Python - Size: 2.57 MB - Last synced at: 1 day ago - Pushed at: 6 months ago - Stars: 3,887 - Forks: 331

argmaxinc/WhisperKit

On-device Speech Recognition for Apple Silicon

Language: Swift - Size: 2.49 MB - Last synced at: 2 days ago - Pushed at: 11 days ago - Stars: 4,713 - Forks: 410

matthiasn/lotti

Achieve your goals and keep your data private with Lotti. This life tracking app is designed to help you stay motivated and on track, all while keeping your personal information safe and secure. Now with on-device speech recognition.

Language: Dart - Size: 45 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 517 - Forks: 53

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 12,364 - Forks: 1,472

toverainc/willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

Language: Python - Size: 3.27 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 455 - Forks: 48

HeyWillow/willow

Open source, local, and self-hosted Amazon Echo/Google Home competitive Voice Assistant alternative

Language: C - Size: 1.87 MB - Last synced at: 3 days ago - Pushed at: 19 days ago - Stars: 2,810 - Forks: 105

k2-fsa/sherpa

Speech-to-text server framework with next-gen Kaldi

Language: C++ - Size: 261 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 710 - Forks: 124

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

Size: 9.34 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1,260 - Forks: 101

VATHSAN08/Mental-Health-Sentiment-Analysis-using-Deep-Learning

# Mental Health Sentiment Analysis using Deep LearningThis project leverages deep learning to classify mental health-related sentiments from text into seven categories: Anxiety, Bipolar, Depression, Normal, Personality Disorder, Stress, and Suicidal. By utilizing advanced NLP techniques, we aim to enhance understanding and support for mental well

Language: Jupyter Notebook - Size: 4.12 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API,使用本地运行的Whisper模型进行推理,并支持多GPU并发,针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫,可实现来自多个社交平台的无缝媒体处理,为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Language: Python - Size: 1.22 MB - Last synced at: about 18 hours ago - Pushed at: 6 days ago - Stars: 383 - Forks: 47

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language: Python - Size: 90.3 MB - Last synced at: 1 day ago - Pushed at: 12 days ago - Stars: 983 - Forks: 242

gmovernight/COS760-ASR-Topic-Modeling-Group1

Optimizing Topic Modeling from Speech: Evaluating ASR and Topic Model Combinations for Setswana Podcasts

Language: Jupyter Notebook - Size: 1.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

exPHAT/SwiftWhisper

🎤 The easiest way to transcribe audio in Swift

Language: Swift - Size: 720 KB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 696 - Forks: 91

Avinraj01/SHL-Grammar-Scoring-Engine-for-Voice-Samples

This model predicts grammar scores (1–5) from audio files. It uses Whisper to transcribe speech to text, cleans the text, and extracts features with TF-IDF. A Random Forest Regressor is trained to learn grammar score patterns. Evaluation via Pearson Correlation showed good results.

Language: Jupyter Notebook - Size: 80.1 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

eellak/gsoc2019-sphinx

Creation of an online Greek mail dictation system, using Sphinx and personalized acoustic/language model training

Language: Python - Size: 2.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 19 - Forks: 2

ceuk/speech-recognition-aws-polyfill

Polyfill for the SpeechRecognition browser API using AWS Transcribe as a fallback

Language: TypeScript - Size: 686 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 10 - Forks: 9

mende237/Nda-Nda-Force-Aligner

Forced alignment of Nda‘ Nda’ a Cameroonian language

Language: Shell - Size: 727 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

linagora-labs/asr_benchmark

Toolkit to benchmark various speech recognition APIs (NeMo, Whisper...) and visualize the results

Language: Jupyter Notebook - Size: 2.45 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 0

luisKING2008/Stream-Omni

Stream-Omni enables seamless interactions across text, vision, and speech using a large language model. This repository includes the model, datasets, and tools for developers to explore multimodal capabilities. 🌟🌐

Language: Python - Size: 10.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

NaomiProject/Naomi

The Naomi Project is an open source, technology agnostic platform for developing always-on, voice-controlled applications!

Language: Python - Size: 5.27 MB - Last synced at: about 17 hours ago - Pushed at: 5 months ago - Stars: 278 - Forks: 60

uhlive/python-sdk

Official Python SDK for uh!ive's Speech Recognition APIs

Language: Python - Size: 573 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 1

gtreshchev/RuntimeSpeechRecognizer 📦

Cross-platform, real-time, offline speech recognition plugin for Unreal Engine. Based on Whisper OpenAI technology, whisper.cpp.

Language: C++ - Size: 24.8 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 297 - Forks: 46

RimAmarat/SpeechCommands

Speech Commands Recognition with very deep 1D CNN (M5 Architecture)

Language: Python - Size: 84 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

PaddlePaddle/PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python - Size: 69.2 MB - Last synced at: 5 days ago - Pushed at: 13 days ago - Stars: 11,996 - Forks: 1,920

DanteVela/Voice-Assistant-Python

This repository implements the GeeksforGeeks “Voice Assistant using Python” tutorial, showcasing a speech-driven virtual assistant powered by Speech Recognition for voice input, pyttsx3 for text-to-speech, Pywhatkit and webbrowser for web tasks, plus Wikipedia lookups and programmer jokes via pyjokes.

Language: Python - Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

Blaizzy/mlx-audio

A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.

Language: Python - Size: 87.4 MB - Last synced at: 5 days ago - Pushed at: 13 days ago - Stars: 2,401 - Forks: 176

RimAmarat/RealTimeSpeechRec

Real Time Speech Recognition with Voice Activity Detection using Pytorch

Language: Python - Size: 13.7 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

speechmatics/speechmatics-python

Python library and CLI for Speechmatics

Language: Python - Size: 2.97 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 74 - Forks: 21

tosinonikute/NotelyVoice

A Notes + AI Voice to Text Audio Transcription App built with Compose Multiplatform Android & iOS using Whisper AI, MVVM and Clean architecture, Jetpack Compose, Material3, Dagger hilt, SQLDelight, Coroutines and Flow

Language: C++ - Size: 76.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 70 - Forks: 2

miekadal4/top-coder-challenge

Reverse-engineer a 60-year-old travel reimbursement system using historical data and employee interviews. Your mission is to replicate its behavior based on 1,000 examples and uncover the original business logic. 🛠️💻

Language: Shell - Size: 75.2 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

openvinotoolkit/openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

Language: C++ - Size: 850 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 8,445 - Forks: 2,631

hyeonsangjeon/computing-Korean-STT-error-rates

STT 한글 문장 인식기 출력 스크립트의 외자 오류율(CER), 단어 오류율(WER)을 계산하는 Python 함수 패키지

Language: Python - Size: 125 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 63 - Forks: 10

pragmatrix/context-switch

Audio Streaming for FreeSWITCH with backends powered by Azure, OpenAI, and Aristech

Language: Rust - Size: 428 KB - Last synced at: 2 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 1

lhotse-speech/lhotse

Tools for handling multimodal data in machine learning projects.

Language: Python - Size: 31.6 MB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 1,028 - Forks: 235

dudarev/speechdown

CLI tool to transcribe your spoken audio notes into timestamped, multilingual Markdown—offline, accurate, and feedback-driven.

Language: Python - Size: 341 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

techAli1996/wakeword

ESP32S3 Wakeword/Keyword Spotting starter project with ready to go ML model

Language: C - Size: 4.68 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

mutablelogic/go-whisper

Speech-to-Text in golang

Language: Go - Size: 8.17 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 114 - Forks: 11

ashishpatel26/Treasure-of-Transformers

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

Language: Jupyter Notebook - Size: 370 KB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 1,001 - Forks: 213

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language: Python - Size: 38.7 MB - Last synced at: 5 days ago - Pushed at: 15 days ago - Stars: 16,296 - Forks: 1,749

edenai/edenai-apis

Eden AI: simplify the use and deployment of AI technologies by providing a unique API that connects to the best possible AI engines

Language: Python - Size: 158 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 449 - Forks: 67

DmitryRyumin/INTERSPEECH-2023-24-Papers

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

Size: 11.4 MB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 674 - Forks: 42

Saik0s/Whisperboard

The open-source iOS app that's making quality voice transcription more accessible on mobile devices.

Language: Swift - Size: 179 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 870 - Forks: 88

03-JS/PySpeech

Automatic speech recognition API for Lethal Company

Language: C# - Size: 26.4 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

deepgram/deepgram-dotnet-sdk

Official .NET SDK for Deepgram.

Language: C# - Size: 6.75 MB - Last synced at: 5 days ago - Pushed at: 10 days ago - Stars: 42 - Forks: 34

baomeomeo/speech

A Speech-To-Text (with translation) library for Go; currently uses Whisper (runs locally if needed; no need in any API keys)

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

kaka-lin/ASR-notes

A practical collection of ASR models and tools — including Whisper variants and Google STT — with implementations for real-time, batch transcription, and multi-platform integration.

Language: Python - Size: 311 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

lobehub/lobe-tts

🎤 Lobe TTS - A high-quality & reliable TTS/STT library for Server and Browser

Language: TypeScript - Size: 390 KB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 627 - Forks: 78

flashlight/wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

Language: C++ - Size: 6.2 MB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 6,432 - Forks: 1,010

ritikpandey01/Jarvis-X

🔍 Jarvis is a personal AI assistant that brings together conversation, automation, and creativity in one place. It can chat intelligently, search the web in real-time, generate images, control your system, and respond to voice commands - all through natural language interaction.

Language: Python - Size: 63.5 KB - Last synced at: 4 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

ggml-org/whisper.cpp

Port of OpenAI's Whisper model in C/C++

Language: C++ - Size: 23 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 40,827 - Forks: 4,355

Kaljurand/K6nele

An Android app that offers speech-to-text user interfaces to other apps

Language: Java - Size: 24.5 MB - Last synced at: 3 days ago - Pushed at: 8 days ago - Stars: 282 - Forks: 82

thewh1teagle/sherpa-rs

Rust bindings to https://github.com/k2-fsa/sherpa-onnx

Language: Rust - Size: 1.49 MB - Last synced at: 5 days ago - Pushed at: about 1 month ago - Stars: 188 - Forks: 28

mozilla/DeepSpeech

DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Language: C++ - Size: 48.2 MB - Last synced at: 7 days ago - Pushed at: 10 months ago - Stars: 26,444 - Forks: 4,055

microsoft/SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing

Language: Python - Size: 17.8 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 1,369 - Forks: 127

ictnlp/Stream-Omni

Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Language: Python - Size: 10.6 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

Related Keywords
speech-recognition 4,922 speech-to-text 1,746 python 1,068 asr 575 speech 518 deep-learning 498 text-to-speech 460 machine-learning 429 speech-synthesis 367 ai 317 python3 308 audio 307 voice-recognition 288 voice-assistant 287 tts 262 nlp 245 whisper 241 speech-processing 199 natural-language-processing 192 tensorflow 191 javascript 185 pytorch 180 artificial-intelligence 177 chatbot 176 openai 168 dataset 155 pyttsx3 150 android 143 audio-processing 137 stt 136 automatic-speech-recognition 134 react 131 voice 127 voice-commands 102 pyaudio 101 transcription 94 virtual-assistant 92 typescript 91 voice-control 89 api 85 llm 84 assistant 84 nodejs 83 translation 80 kaldi 78 java 77 ios 76 computer-vision 75 reactjs 72 automation 72 neural-network 70 flask 70 swift 69 chatgpt 68 raspberry-pi 66 asr-model 64 html 63 deep-neural-networks 60 css 59 vosk 59 keras 59 cnn 57 jarvis 55 tailwindcss 55 whisper-ai 54 deepspeech 52 sentiment-analysis 52 transformer 52 gtts 51 wav2vec2 50 speechrecognition 49 emotion-recognition 49 web-speech-api 47 tkinter 47 transformers 47 azure 45 recognition 45 neural-networks 45 nlp-machine-learning 43 html5 43 sdk 43 language-model 42 machine-translation 42 bot 42 hacktoberfest 42 docker 41 convolutional-neural-networks 41 speaker-recognition 41 huggingface 41 openai-api 40 personal-assistant 40 speech-api 40 openai-whisper 40 conversational-ai 39 speech-analysis 39 streamlit 38 speech-recognizer 38 csharp 37 face-recognition 37 voice-activity-detection 37