asr | Topic | Ecosyste.ms: Repos

Topic: "asr"

m-bain/whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language: Python - Size: 38.6 MB - Last synced at: 6 days ago - Pushed at: 8 days ago - Stars: 15,321 - Forks: 1,657

NVIDIA/NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language: Python - Size: 435 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 13,794 - Forks: 2,806

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.

Language: Python - Size: 69.4 MB - Last synced at: 10 days ago - Pushed at: 20 days ago - Stars: 11,845 - Forks: 1,904

speechbrain/speechbrain

A PyTorch-based Speech Toolkit

Language: Python - Size: 97.8 MB - Last synced at: 5 days ago - Pushed at: 15 days ago - Stars: 9,783 - Forks: 1,485

alphacep/vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Language: Jupyter Notebook - Size: 13.8 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 9,406 - Forks: 1,266

wzpan/wukong-robot

🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目，支持ChatGPT多轮对话能力，还可能是首个支持脑机交互的开源智能音箱项目。

Language: Python - Size: 10.1 MB - Last synced at: about 14 hours ago - Pushed at: 7 months ago - Stars: 6,822 - Forks: 1,387

k2-fsa/sherpa-onnx

Speech-to-text, text-to-speech, speaker diarization, speech enhancement, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, HarmonyOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, support 11 programming languages

Language: C++ - Size: 9.1 MB - Last synced at: 5 days ago - Pushed at: 10 days ago - Stars: 5,836 - Forks: 659

FunAudioLLM/SenseVoice

Multilingual Voice Understanding Model

Language: Python - Size: 6.51 MB - Last synced at: 4 days ago - Pushed at: about 2 months ago - Stars: 5,556 - Forks: 495

snakers4/silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Language: Jupyter Notebook - Size: 488 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 5,253 - Forks: 336

xiangyuecn/Recorder

html5 js 录音 mp3 wav ogg webm amr g711a g711u 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码

Language: JavaScript - Size: 13.2 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 5,221 - Forks: 1,068

NexaAI/nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

Language: Python - Size: 195 MB - Last synced at: 3 days ago - Pushed at: 2 months ago - Stars: 4,531 - Forks: 628

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language: Python - Size: 24.3 MB - Last synced at: 18 days ago - Pushed at: about 1 month ago - Stars: 4,461 - Forks: 1,131

MahmoudAshraf97/whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Language: Jupyter Notebook - Size: 435 KB - Last synced at: 13 days ago - Pushed at: 19 days ago - Stars: 4,432 - Forks: 411

jdepoix/youtube-transcript-api

This is a python API which allows you to get the transcript/subtitles for a given YouTube video. It also works for automatically generated subtitles and it does not require an API key nor a headless browser, like other selenium based solutions do!

Language: Python - Size: 1.97 MB - Last synced at: 4 days ago - Pushed at: 18 days ago - Stars: 3,855 - Forks: 444

PeterH0323/Streamer-Sales

Streamer-Sales 销冠 —— 卖货主播 LLM 大模型🛒🎁，一个能够根据给定的商品特点从激发用户购买意愿角度出发进行商品解说的卖货主播大模型。🚀⭐内含详细的数据生成流程❗ 📦另外还集成了 LMDeploy 加速推理🚀、RAG检索增强生成 📚、TTS文字转语音🔊、数字人生成 🦸、 Agent 使用网络查询实时信息🌐、ASR 语音转文字🎙️、Vue 生态搭建前端🍍、FastAPI 搭建后端🗝️、Docker-compose 打包部署🐋

Language: Python - Size: 68.6 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 3,150 - Forks: 490

tensorflow/lingvo

Lingvo

Language: Python - Size: 142 MB - Last synced at: 3 days ago - Pushed at: 11 days ago - Stars: 2,838 - Forks: 451

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

Language: Python - Size: 1.76 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2,517 - Forks: 449

coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

Language: C++ - Size: 53.4 MB - Last synced at: 6 days ago - Pushed at: about 1 year ago - Stars: 2,425 - Forks: 286

mravanelli/pytorch-kaldi

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Language: Python - Size: 567 KB - Last synced at: 27 days ago - Pushed at: about 3 years ago - Stars: 2,384 - Forks: 445

linto-ai/whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

Language: Python - Size: 4.49 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 2,368 - Forks: 181

CheshireCC/faster-whisper-GUI

faster_whisper GUI with PySide6

Language: Python - Size: 94.6 MB - Last synced at: 29 days ago - Pushed at: 5 months ago - Stars: 2,282 - Forks: 136

Purfview/whisper-standalone-win

Whisper & Faster-Whisper standalone executables for those who don't want to bother with Python.

Size: 207 KB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 1,925 - Forks: 90

Delta-ML/delta

DELTA is a deep learning based natural language and speech processing platform. LF AI & DATA Projects: https://lfaidata.foundation/projects/delta/

Language: Python - Size: 59.5 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 1,590 - Forks: 288

harry0703/AudioNotes

快速提取音视频内容，整理成一份结构化的markdown笔记

Language: Python - Size: 732 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 1,583 - Forks: 228

umlx5h/LLPlayer

The media player for language learning, with dual subtitles, AI-generated subtitles, real-time translation, and more!

Language: C# - Size: 108 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1,502 - Forks: 78

k2-fsa/sherpa-ncnn

Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.

Language: C++ - Size: 2.06 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1,317 - Forks: 178

lenML/Speech-AI-Forge

🍦 Speech-AI-Forge is a project developed around TTS generation model, implementing an API Server and a Gradio-based WebUI.

Language: Python - Size: 10.5 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1,208 - Forks: 161

mravanelli/SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.

Language: Python - Size: 78.9 MB - Last synced at: 14 days ago - Pushed at: about 4 years ago - Stars: 1,174 - Forks: 265

R3gm/SoniTranslate

Synchronized Translation for Videos. Video dubbing

Language: Python - Size: 19.4 MB - Last synced at: 30 days ago - Pushed at: 3 months ago - Stars: 1,096 - Forks: 226

ictnlp/StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.

Language: Python - Size: 18.2 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 1,053 - Forks: 80

yeyupiaoling/Whisper-Finetune

Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training without speech data. Accelerate inference and support Web deployment, Windows desktop deployment, and Android deployment

Language: C - Size: 5.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 1,024 - Forks: 169

sooftware/conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

Language: Python - Size: 2.81 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 1,018 - Forks: 184

pykaldi/pykaldi

A Python wrapper for Kaldi

Language: Python - Size: 2.68 MB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 1,012 - Forks: 246

athena-team/athena

an open-source implementation of sequence-to-sequence based speech processing engine

Language: C++ - Size: 9.94 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 947 - Forks: 189

freewym/espresso

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Language: Python - Size: 17.2 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 942 - Forks: 116

alphacep/vosk-server

WebSocket, gRPC and WebRTC speech recognition server based on Vosk and Kaldi libraries

Language: Python - Size: 1.18 MB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 934 - Forks: 249

innovatorved/whisper.api

This project provides an API with user level access support to transcribe speech to text using a finetuned and processed Whisper ASR model.

Language: Python - Size: 521 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 884 - Forks: 38

mkiol/dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.

Language: C++ - Size: 75.2 MB - Last synced at: about 9 hours ago - Pushed at: about 11 hours ago - Stars: 875 - Forks: 37

FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

Language: Python - Size: 658 KB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 873 - Forks: 63

yeyupiaoling/PPASR

基于PaddlePaddle实现端到端中文语音识别，从入门到实战，超简单的入门案例，超实用的企业项目。支持当前最流行的DeepSpeech2、Conformer、Squeezeformer模型

Language: Python - Size: 17.7 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 859 - Forks: 129

wwbin2017/bailing

百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，时延低至800ms，Mac等低配置也可运行，支持打断

Language: Python - Size: 2.72 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 838 - Forks: 143

srvk/eesen

The official repository of the Eesen project

Language: C++ - Size: 5.86 MB - Last synced at: 9 days ago - Pushed at: almost 6 years ago - Stars: 829 - Forks: 342

snakers4/open_stt 📦

Open STT

Language: Python - Size: 87.9 KB - Last synced at: 3 days ago - Pushed at: about 3 years ago - Stars: 794 - Forks: 84

kaituoxu/Speech-Transformer

A PyTorch implementation of Speech Transformer, an End-to-End ASR with Transformer network on Mandarin Chinese.

Language: Python - Size: 678 KB - Last synced at: 6 months ago - Pushed at: about 2 years ago - Stars: 771 - Forks: 196

byjlw/video-analyzer

Analyze videos using LLMs, Computer Vision and Automatic Speech Recognition

Language: Python - Size: 320 KB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 753 - Forks: 93

alphacep/vosk-android-demo

Offline speech recognition for Android with Vosk library.

Language: Java - Size: 269 MB - Last synced at: 6 months ago - Pushed at: over 1 year ago - Stars: 752 - Forks: 203

yeyupiaoling/PaddlePaddle-DeepSpeech

基于PaddlePaddle实现的语音识别，中文语音识别。项目完善，识别效果好。支持Windows，Linux下训练和预测，支持Nvidia Jetson开发板预测。

Language: Python - Size: 15 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 726 - Forks: 148

Ailln/cn2an

📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）

Language: Python - Size: 685 KB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 716 - Forks: 79

openspeech-team/openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Language: Python - Size: 7.49 MB - Last synced at: 20 days ago - Pushed at: over 1 year ago - Stars: 697 - Forks: 116

iceychris/LibreASR 📦

:speech_balloon: An On-Premises, Streaming Speech Recognition System

Language: Python - Size: 6.15 MB - Last synced at: 3 days ago - Pushed at: over 3 years ago - Stars: 683 - Forks: 30

k2-fsa/sherpa

Speech-to-text server framework with next-gen Kaldi

Language: C++ - Size: 114 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 676 - Forks: 116

yeyupiaoling/MASR

Pytorch实现的流式与非流式的自动语音识别框架，同时兼容在线和离线识别，目前支持Conformer、Squeezeformer、DeepSpeech2模型，支持多种数据增强方法。

Language: Python - Size: 9.43 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 669 - Forks: 112

DmitryRyumin/INTERSPEECH-2023-24-Papers

INTERSPEECH 2023-2024 Papers: A complete collection of influential and exciting research papers from the INTERSPEECH 2023-24 conference. Explore the latest advances in speech and language processing. Code included. Star the repository to support the advancement of speech technology!

Size: 11.4 MB - Last synced at: about 5 hours ago - Pushed at: 5 months ago - Stars: 668 - Forks: 42

nyrahealth/CrisperWhisper

Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection

Language: Python - Size: 8.37 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 662 - Forks: 31

Picovoice/cheetah

On-device streaming speech-to-text engine powered by deep learning

Language: Python - Size: 290 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 621 - Forks: 71

sooftware/kospeech

Open-Source Toolkit for End-to-End Korean Automatic Speech Recognition leveraging PyTorch and Hydra.

Language: Python - Size: 920 MB - Last synced at: 28 days ago - Pushed at: almost 2 years ago - Stars: 617 - Forks: 192

abhirooptalasila/AutoSub

A CLI script to generate subtitle files (SRT/VTT/TXT) for any video using either DeepSpeech or Coqui

Language: Python - Size: 91.8 KB - Last synced at: 28 days ago - Pushed at: over 1 year ago - Stars: 596 - Forks: 103

hirofumi0810/neural_sp

End-to-end ASR/LM implementation with PyTorch

Language: Python - Size: 8.66 MB - Last synced at: 9 days ago - Pushed at: over 3 years ago - Stars: 596 - Forks: 139

zw76859420/ASR_Theory

语音识别理论、论文和PPT

Size: 253 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 575 - Forks: 182

yinruiqing/pyannote-whisper

Language: Python - Size: 3.34 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 574 - Forks: 100

speechio/chinese_text_normalization

Chinese text normalization for speech processing

Language: Python - Size: 918 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 554 - Forks: 135

RapidAI/RapidASR

📣 商用级开源语音自动识别程序库，开箱即用，全平台支持，中英文混合识别。A Cross-platform implementation of ASR inference. It's based on ONNXRuntime and FunASR. We provide a set of easier APIs to call ASR models.

Language: C++ - Size: 35.8 MB - Last synced at: 29 days ago - Pushed at: 12 months ago - Stars: 544 - Forks: 63

Macoron/whisper.unity

Running speech to text model (whisper.cpp) in Unity3d on your local machine.

Language: C# - Size: 114 MB - Last synced at: 26 days ago - Pushed at: 26 days ago - Stars: 515 - Forks: 116

SpeechColab/Leaderboard

SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.

Language: Python - Size: 15.7 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 483 - Forks: 65

DmitryRyumin/ICASSP-2023-24-Papers

ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!

Language: Python - Size: 9.11 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 459 - Forks: 18

Picovoice/leopard

On-device speech-to-text engine powered by deep learning

Language: Python - Size: 419 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 454 - Forks: 28

jonatasgrosman/huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Language: Python - Size: 598 KB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 447 - Forks: 45

gooofy/zamia-speech

Open tools and data for cloudless automatic speech recognition

Language: Python - Size: 192 MB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 447 - Forks: 84

ccoreilly/vosk-browser

A speech recognition library running in the browser thanks to a WebAssembly build of Vosk

Language: JavaScript - Size: 707 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 423 - Forks: 66

double22a/speech_dataset

The dataset of Speech Recognition

Size: 74.2 KB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 413 - Forks: 77

DeutscheKI/tevr-asr-tool

State-of-the-art (ranked #1 Aug 2022) German Speech Recognition in 284 lines of C++. This is a 100% private 100% offline 100% free CLI tool.

Language: C - Size: 289 KB - Last synced at: 28 days ago - Pushed at: almost 3 years ago - Stars: 413 - Forks: 18

shashikg/WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

Language: Jupyter Notebook - Size: 1.16 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 406 - Forks: 49

revdotcom/reverb

Open source inference code for Rev's model

Language: Python - Size: 507 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 399 - Forks: 25

lium-lst/nmtpytorch 📦

Sequence-to-Sequence Framework in PyTorch

Language: Jupyter Notebook - Size: 7.49 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 391 - Forks: 51

deepgram-devs/deepgram-ai-agent-demo Fork of deepgram-starters/nextjs-live-transcription

Deepgram Conversational AI demo

Language: TypeScript - Size: 10.2 MB - Last synced at: 3 days ago - Pushed at: 26 days ago - Stars: 388 - Forks: 108

metame-ai/awesome-audio-plaza

Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation

Size: 358 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 382 - Forks: 17

Evil0ctal/Fast-Powerful-Whisper-AI-Services-API

⚡ 一款用于自动语音识别 (ASR)、翻译的高性能异步 API。不需要购买Whisper API，使用本地运行的Whisper模型进行推理，并支持多GPU并发，针对分布式部署进行设计。还内置了包括TikTok、抖音等社交媒体平台的爬虫，可实现来自多个社交平台的无缝媒体处理，为媒体内容数据自动化处理提供了强大且可扩展的解决方案。

Language: Python - Size: 1.21 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 368 - Forks: 42

HMS-Core/hms-ml-demo

HMS ML Demo provides an example of integrating Huawei ML Kit service into applications. This example demonstrates how to integrate services provided by ML Kit, such as face detection, text recognition, image segmentation, asr, and tts.

Language: Java - Size: 229 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 367 - Forks: 120