GitHub topics: automatic-speech-recognition
estuary-ai/mangrove
Mangrove is the backend module of Estuary, a framework for building multimodal real-time Socially Intelligent Agents (SIAs).
Language: Python - Size: 2.23 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 11 - Forks: 2

Livyatan-melvillei/ai-clips-maker
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
Language: Python - Size: 69.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 16 - Forks: 3

Picovoice/cheetah
On-device streaming speech-to-text engine powered by deep learning
Language: Python - Size: 504 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 634 - Forks: 71

ckaytev/tgisper
Telegram bot with ASR
Language: Python - Size: 125 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 22 - Forks: 3

mdhasanai/Bangla_E2E_ASR
Bangla Automatic Speech Recognition
Language: Python - Size: 191 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

tsmdt/whisply
💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... fast!
Language: Python - Size: 4.06 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 53 - Forks: 12

TEN-framework/ten-vad
Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight
Language: C - Size: 9.79 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 627 - Forks: 59

mgonzs13/whisper_ros
Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2
Language: C++ - Size: 1.93 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 19

coqui-ai/STT
🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
Language: C++ - Size: 53.4 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 2,459 - Forks: 289

ahmetoner/whisper-asr-webservice
OpenAI Whisper ASR Webservice API
Language: Python - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,693 - Forks: 480

jitsi/jiwer
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Language: Python - Size: 1.68 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 745 - Forks: 104

alif-munim/clinical-camel-asr
An end-to-end pipeline for clinical dialogue transcription and summarization using large language models.
Language: Python - Size: 64.5 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2 - Forks: 0

Picovoice/leopard
On-device speech-to-text engine powered by deep learning
Language: Python - Size: 421 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 457 - Forks: 29

mocomoco-inc/mocovoice-mcp-server
mocoVoice MCP Server
Language: Python - Size: 37.1 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

noco-ai/spellbook-docker
AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models
Language: Shell - Size: 2.39 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 160 - Forks: 12

nico-byte/whisper-web
The Whisper Web Transcription Server is a Python-based real-time speech-to-text transcription system powered by OpenAI's Whisper models. It leverages state-of-the-art models like Distil-Whisper to transcribe audio input in real-time.
Language: Python - Size: 6.23 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

TensorSpeech/TensorFlowASR
:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Language: Python - Size: 90.3 MB - Last synced at: 11 days ago - Pushed at: 21 days ago - Stars: 983 - Forks: 242

leduckhai/MultiMed
[LREC-COLING 2024 (Oral), Interspeech 2024 (Oral), NAACL 2025, ACL 2025] A Series of Multilingual Multitask Medical Speech Processing
Language: Python - Size: 22.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 350 - Forks: 36

zzw922cn/awesome-speech-recognition-speech-synthesis-papers
Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)
Size: 197 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 3,050 - Forks: 515

mende237/Nda-Nda-Force-Aligner
Forced alignment of Nda‘ Nda’ a Cameroonian language
Language: Shell - Size: 727 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

EmulationAI/awesome-large-audio-models
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Size: 6.56 MB - Last synced at: 13 days ago - Pushed at: 11 months ago - Stars: 679 - Forks: 42

ieasybooks/tafrigh
تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.
Language: Python - Size: 631 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 133 - Forks: 18

acw-upv/INTERSPEECH2023_AlzheimersDisease
This repository contains the code for the INTERSPEECH2023 paper: "Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses"
Language: Jupyter Notebook - Size: 214 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 6 - Forks: 0

EliFuzz/parakeet-mlx
Unified, high-performance NVIDIA Parakeet ASR implementation for Apple Silicon (MLX) with real-time transcription and advanced audio processing capabilities
Language: Python - Size: 61.5 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

mozilla-ai/speech-to-text
Blueprint by Mozilla.ai on how to transcribe audio files
Size: 239 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 13 - Forks: 0

YoavRamon/awesome-kaldi
This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )
Size: 18.6 KB - Last synced at: 13 days ago - Pushed at: over 3 years ago - Stars: 536 - Forks: 84

kakaobrain/pororo 📦
PORORO: Platform Of neuRal mOdels for natuRal language prOcessing
Language: Python - Size: 12.8 MB - Last synced at: 18 days ago - Pushed at: over 3 years ago - Stars: 1,297 - Forks: 223

Rafat-decodis/Robust-ASR-for-Low-Resource-Languages
Exploring Benchmark Gaps and Real-World Speech Generalization for Language in Low Resource
Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 9 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

NavodPeiris/speechlib
speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names
Language: Python - Size: 33.9 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 219 - Forks: 21

rolczynski/Automatic-Speech-Recognition 📦
🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)
Language: Python - Size: 3.6 MB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 225 - Forks: 63

OpenVoiceOS/ovos-stt-plugin-chromium
A stt plugin for mycroft using the google chrome browser api
Language: Python - Size: 36.1 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 1

snakers4/open_stt 📦
Open STT
Language: Python - Size: 87.9 KB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 798 - Forks: 84

smeetrs/deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Language: Python - Size: 45.9 KB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 232 - Forks: 41

a-iceberg/whisper-timestamped
Timestamped ASR microservice
Language: Jupyter Notebook - Size: 3.29 MB - Last synced at: 23 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

aliencaocao/TIL-2024
Brainhack TIL 2024: Team 12000SGDPLUSHIE
Language: Jupyter Notebook - Size: 235 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 14 - Forks: 2

wenet-e2e/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language: Python - Size: 24.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4,539 - Forks: 1,134

jonatasgrosman/huggingsound
HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools
Language: Python - Size: 598 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 457 - Forks: 45

mllpresearch/ESO-dataset
ESO speech dataset: an English-language speech corpus of the oncology domain for ASR training and benchmarking and MT benchmarking.
Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

analyticsinmotion/werx
🐍📦 Easy-to-use Python package for lightning-fast Word Error Rate analysis
Language: Python - Size: 227 KB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

siddqamar/Quick-Transcribe-AI
Get the transcripts for your voicemails, lectures & meetings.
Language: Python - Size: 3.91 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lucasnewman/best-rq-pytorch
Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.
Language: Python - Size: 365 KB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 120 - Forks: 11

taresh18/AnimeVox
🎧 11K High Quality Anime Audio Clips, Transcriptions & Speaker Labels for TTS, ASR & Voice Cloning ✨
Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

01Zhangbw/Speech-and-audio-papers-Top-Conference
It includes papers on speech&audio field. Now update: ICLR2025-2023, ICML2025-2023, NeurIPS2024-2023, ACMMM2024, AAAI2025-2024, ACL2025-2024, EMNLP2024, NAACL2025, IJCAI2024, ECCV2024
Size: 290 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 61 - Forks: 1

ArthurFDLR/whisper-youtube
🔉 Youtube Videos Transcription with OpenAI's Whisper
Language: Jupyter Notebook - Size: 124 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 396 - Forks: 115

j3soon/whisper-to-input
An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.
Language: Kotlin - Size: 3.27 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 77 - Forks: 7

Lingua-Connect/Lingua-Frontend
lingua_Connect is a modern, interactive web application that leverages machine translation to enable real-time multilingual communication between field officers and farmers in rural settings.
Language: TypeScript - Size: 286 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

pr0mila/MediBeng-Whisper-Tiny
MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.
Language: Python - Size: 2.23 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 1

roboticslab-uc3m/speech
Text To Speech (TTS) and Automatic Speech Recognition (ASR).
Language: C++ - Size: 71.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 5

pr0mila/ParquetToHuggingFace
ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.
Language: Python - Size: 2.84 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

dangvansam/viet-asr
VietASR - Vietnamese Automatic Speech Recognition
Language: Python - Size: 289 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 130 - Forks: 54

JarbasAl/pocketsphinx-models-mirror
pocketsphinx models for languages originating from the iberian peninsula
Size: 337 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 4

undertheseanlp/automatic_speech_recognition
Vietnamese Automatic Speech Recognition
Language: Python - Size: 131 MB - Last synced at: 7 days ago - Pushed at: over 6 years ago - Stars: 69 - Forks: 38

double22a/asr_nlp_paper_code
Papers of ASR, Tools of ASR
Size: 655 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 40 - Forks: 9

zzw922cn/Automatic_Speech_Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Language: Python - Size: 5.53 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 2,842 - Forks: 533

KBNLresearch/videotools Fork of ookgezellig/videotools
A collection of tools to cut, compress, extract, amplify and transcribe (audiotracks of) video files
Language: Python - Size: 6.81 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

ookgezellig/videotools
A collection of tools to cut, compress, extract, amplify and transcribe (audiotracks of) video files
Language: Python - Size: 6.81 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

dragonhub0710/oraguide
Oraguide is an AI voice agent designed to enhance customer interactions through real-time voice communication.
Language: JavaScript - Size: 79.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Pleasurecruise/3D-AI-Agent
This project aims to create an AI agent capable of expressing a range of emotions through facial expressions and tone of voice, using Large Language Models (LLMs) and Large Vision Models (LVMs).
Language: C# - Size: 626 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

analyticsinmotion/werpy
🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.
Language: Python - Size: 603 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 13 - Forks: 4

double22a/speech_dataset
The dataset of Speech Recognition
Size: 74.2 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 413 - Forks: 77

aliyzd95/modified_shemo
A modification on the Sharif Emotional Speech Database
Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 2

0xPD33/sonori
Sonori is a fully local STT app for linux (wayland).
Language: Rust - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

sinaahmadi/CORDI
Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)
Language: Python - Size: 25.9 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 11 - Forks: 2

srinivr/kaldi-long-audio-alignment
Long audio alignment using Kaldi
Language: Shell - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 23 - Forks: 10

FireRedTeam/FireRedASR
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.
Language: Python - Size: 658 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 873 - Forks: 63

PrathuashaKB/ASR-Using-Deep-Learning
Automatic Speech Recognition is a technique that processes human speech into readable text, also known as speech-to-text or transcription systems. Mini-Project I at SSIT: Project cycle closed.
Language: Python - Size: 7.22 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

linto-ai/linto-agent
LinTO platform services stack deployment tool for Docker Swarm cluster
Language: JavaScript - Size: 1.01 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

ttop32/wav2vec2-live-japanese-translator
real time japanese speech recognition translator using wav2vec2
Language: Jupyter Notebook - Size: 926 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 3

lkmeta/txtify
Web application that converts audio and video to text using AI, supporting various formats and self-hosting.
Language: Python - Size: 6.77 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 80 - Forks: 7

alperensumeroglu/ai-clips-maker
AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.
Language: Python - Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

saurabhchalke/whisper-meta-quest
Running speech-to-text in a Meta Quest headset using OpenAI's Whisper tiny model
Language: C# - Size: 98.1 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 31 - Forks: 3

github-bowen/TUD-Inclusive-Speech-Technology
Lab assignments of course DSAIT4095 Inclusive Speech Technology (2024/2025) in TU Delft.
Language: Jupyter Notebook - Size: 7.14 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mozilla-ai/speech-to-text-finetune
Blueprint by Mozilla.ai for finetuning a Speech-To-Text model in your own language
Language: Python - Size: 5.24 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 36 - Forks: 4

CoEDL/elpis
🙊 software for creating speech recognition models.
Language: Python - Size: 82.5 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 159 - Forks: 33

tugstugi/mongolian-speech-recognition
Mongolian speech recognition with PyTorch
Language: Python - Size: 164 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 134 - Forks: 52

mahshid1378/TensorFlowASR
TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords
Language: Python - Size: 4.89 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ZQuang2202/Zipformer_Lightning
An upgrade framework for train and validate compare with icefall using Lightning.
Language: Python - Size: 268 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

ZQuang2202/Zipformer_Triton
A template for serving zipformer on Triton Inference Server.
Language: Python - Size: 1.65 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

jmaczan/asr-dysarthria
Research on Automatic Speech Recognition for dysarthric speech
Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 2

egorsmkv/w2v2-bert-aligner
Aligner for wav2vec2-bert models
Language: Python - Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

ksm26/Serverless-LLM-apps-with-Amazon-Bedrock
The course equips you with the skills to deploy Large Language Model (LLM)-based applications into production using serverless technology with Amazon Bedrock.
Language: Jupyter Notebook - Size: 1.66 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 8

bagustris/detect-segment-cough
A python model to detect and segment coughs, forked from coughvid's repo
Language: Jupyter Notebook - Size: 822 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 10 - Forks: 3

j3soon/speech-to-windows-input
Perform speech-to-text (STT/ASR) with Azure speech service and simulate keyboard to input the recognized text; Supports English, Chinese, Japanese, and more.
Language: C# - Size: 2.4 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 32 - Forks: 3

khakers/go-subgen
Automatically generate subtitles for your media using whisper.cpp via webhooks with support for Radarr & Sonarr
Language: Go - Size: 7.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 62 - Forks: 1

George0828Zhang/torch_cif
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.
Language: Python - Size: 167 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 3

matusstas/openai-whisper-microservice
This is an OpenAI Whisper automatic speech recognition microservice
Language: Python - Size: 791 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 22 - Forks: 2

DominikLindorfer/Speech-to-Clipboard
Vibecoding on Windows using Speech-to-Clipboard
Language: Python - Size: 56.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

MooersLab/bash-whisper-transcription
Bash function to ease the transcription of audio files with OpenAI's whisper.
Language: Python - Size: 159 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 1

Jeronymous/deep_learning_notebooks
Self-containing notebooks to play simply with some particular concepts in Deep Learning
Language: Jupyter Notebook - Size: 17.1 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

a-iceberg/whisper_model_evaluator Fork of format37/vosk_model_evaluator
WER, MER, WIL of Whisper vs Vosk vs Google transcribators comparator
Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

winstxnhdw/CapGen
A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.
Language: Python - Size: 1.09 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 10 - Forks: 1

egorsmkv/cv10-uk-testset-clean
The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦
Size: 409 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

PyThaiNLP/pythaiasr
Python Thai Automatic Speech Recognition
Language: Python - Size: 178 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 66 - Forks: 13

lucasgris/wav2vec4bp
Wav2vec resources and models for Brazilian Portuguese
Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

sungnyun/ARMHuBERT
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
Language: Python - Size: 4.52 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 39 - Forks: 6

egorsmkv/whisper-ukrainian 📦
Trainer and Evaluation scripts for fine-tuning Whisper models for the Ukrainian language
Language: Python - Size: 69.3 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 0

kmario23/KenLM-training
Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2
Size: 5.86 KB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 114 - Forks: 21

joybratasarkar/Text-to-speech-Wav2vec-pretained
A fine-tuned Wav2Vec2-based Automatic Speech Recognition (ASR) system with data augmentation, efficient training, and transcription capabilities. Supports local and Mozilla Common Voice datasets, with evaluation via Word Error Rate (WER). 🚀
Language: Python - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Nexdata-AI/128-Hours-English-Australia-Children-Real-world-Casual-Conversation-and-Monologue-speech-dataset
Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Nexdata-AI/203-Hours-German-Financial-Entities-Real-world-Casual-Conversation-and-Monologue-speech-dataset
Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0
