An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: automatic-speech-recognition

estuary-ai/mangrove

Mangrove is the backend module of Estuary, a framework for building multimodal real-time Socially Intelligent Agents (SIAs).

Language: Python - Size: 2.23 MB - Last synced at: about 15 hours ago - Pushed at: about 16 hours ago - Stars: 11 - Forks: 2

Livyatan-melvillei/ai-clips-maker

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Language: Python - Size: 69.3 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 16 - Forks: 3

Picovoice/cheetah

On-device streaming speech-to-text engine powered by deep learning

Language: Python - Size: 504 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 634 - Forks: 71

ckaytev/tgisper

Telegram bot with ASR

Language: Python - Size: 125 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 22 - Forks: 3

mdhasanai/Bangla_E2E_ASR

Bangla Automatic Speech Recognition

Language: Python - Size: 191 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

tsmdt/whisply

💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... fast!

Language: Python - Size: 4.06 MB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 53 - Forks: 12

TEN-framework/ten-vad

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

Language: C - Size: 9.79 MB - Last synced at: 3 days ago - Pushed at: 6 days ago - Stars: 627 - Forks: 59

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

Language: C++ - Size: 1.93 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 78 - Forks: 19

coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

Language: C++ - Size: 53.4 MB - Last synced at: 4 days ago - Pushed at: over 1 year ago - Stars: 2,459 - Forks: 289

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

Language: Python - Size: 1.76 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,693 - Forks: 480

jitsi/jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Language: Python - Size: 1.68 MB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 745 - Forks: 104

alif-munim/clinical-camel-asr

An end-to-end pipeline for clinical dialogue transcription and summarization using large language models.

Language: Python - Size: 64.5 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 2 - Forks: 0

Picovoice/leopard

On-device speech-to-text engine powered by deep learning

Language: Python - Size: 421 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 457 - Forks: 29

mocomoco-inc/mocovoice-mcp-server

mocoVoice MCP Server

Language: Python - Size: 37.1 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3 - Forks: 0

noco-ai/spellbook-docker

AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models

Language: Shell - Size: 2.39 MB - Last synced at: 9 days ago - Pushed at: about 1 year ago - Stars: 160 - Forks: 12

nico-byte/whisper-web

The Whisper Web Transcription Server is a Python-based real-time speech-to-text transcription system powered by OpenAI's Whisper models. It leverages state-of-the-art models like Distil-Whisper to transcribe audio input in real-time.

Language: Python - Size: 6.23 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language: Python - Size: 90.3 MB - Last synced at: 11 days ago - Pushed at: 21 days ago - Stars: 983 - Forks: 242

leduckhai/MultiMed

[LREC-COLING 2024 (Oral), Interspeech 2024 (Oral), NAACL 2025, ACL 2025] A Series of Multilingual Multitask Medical Speech Processing

Language: Python - Size: 22.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 350 - Forks: 36

zzw922cn/awesome-speech-recognition-speech-synthesis-papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Size: 197 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 3,050 - Forks: 515

mende237/Nda-Nda-Force-Aligner

Forced alignment of Nda‘ Nda’ a Cameroonian language

Language: Shell - Size: 727 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 2 - Forks: 0

EmulationAI/awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

Size: 6.56 MB - Last synced at: 13 days ago - Pushed at: 11 months ago - Stars: 679 - Forks: 42

ieasybooks/tafrigh

تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.

Language: Python - Size: 631 KB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 133 - Forks: 18

acw-upv/INTERSPEECH2023_AlzheimersDisease

This repository contains the code for the INTERSPEECH2023 paper: "Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses"

Language: Jupyter Notebook - Size: 214 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 6 - Forks: 0

EliFuzz/parakeet-mlx

Unified, high-performance NVIDIA Parakeet ASR implementation for Apple Silicon (MLX) with real-time transcription and advanced audio processing capabilities

Language: Python - Size: 61.5 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

mozilla-ai/speech-to-text

Blueprint by Mozilla.ai on how to transcribe audio files

Size: 239 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 13 - Forks: 0

YoavRamon/awesome-kaldi

This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )

Size: 18.6 KB - Last synced at: 13 days ago - Pushed at: over 3 years ago - Stars: 536 - Forks: 84

kakaobrain/pororo 📦

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Language: Python - Size: 12.8 MB - Last synced at: 18 days ago - Pushed at: over 3 years ago - Stars: 1,297 - Forks: 223

Rafat-decodis/Robust-ASR-for-Low-Resource-Languages

Exploring Benchmark Gaps and Real-World Speech Generalization for Language in Low Resource

Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 9 days ago - Pushed at: 21 days ago - Stars: 2 - Forks: 0

NavodPeiris/speechlib

speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names

Language: Python - Size: 33.9 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 219 - Forks: 21

rolczynski/Automatic-Speech-Recognition 📦

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

Language: Python - Size: 3.6 MB - Last synced at: 6 days ago - Pushed at: about 5 years ago - Stars: 225 - Forks: 63

OpenVoiceOS/ovos-stt-plugin-chromium

A stt plugin for mycroft using the google chrome browser api

Language: Python - Size: 36.1 KB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 2 - Forks: 1

snakers4/open_stt 📦

Open STT

Language: Python - Size: 87.9 KB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 798 - Forks: 84

smeetrs/deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Language: Python - Size: 45.9 KB - Last synced at: 26 days ago - Pushed at: over 1 year ago - Stars: 232 - Forks: 41

a-iceberg/whisper-timestamped

Timestamped ASR microservice

Language: Jupyter Notebook - Size: 3.29 MB - Last synced at: 23 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

aliencaocao/TIL-2024

Brainhack TIL 2024: Team 12000SGDPLUSHIE

Language: Jupyter Notebook - Size: 235 MB - Last synced at: 28 days ago - Pushed at: 29 days ago - Stars: 14 - Forks: 2

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language: Python - Size: 24.3 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4,539 - Forks: 1,134

jonatasgrosman/huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Language: Python - Size: 598 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 457 - Forks: 45

mllpresearch/ESO-dataset

ESO speech dataset: an English-language speech corpus of the oncology domain for ASR training and benchmarking and MT benchmarking.

Size: 27.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

analyticsinmotion/werx

🐍📦 Easy-to-use Python package for lightning-fast Word Error Rate analysis

Language: Python - Size: 227 KB - Last synced at: 16 days ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

siddqamar/Quick-Transcribe-AI

Get the transcripts for your voicemails, lectures & meetings.

Language: Python - Size: 3.91 KB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

Language: Python - Size: 365 KB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 120 - Forks: 11

taresh18/AnimeVox

🎧 11K High Quality Anime Audio Clips, Transcriptions & Speaker Labels for TTS, ASR & Voice Cloning ✨

Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

01Zhangbw/Speech-and-audio-papers-Top-Conference

It includes papers on speech&audio field. Now update: ICLR2025-2023, ICML2025-2023, NeurIPS2024-2023, ACMMM2024, AAAI2025-2024, ACL2025-2024, EMNLP2024, NAACL2025, IJCAI2024, ECCV2024

Size: 290 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 61 - Forks: 1

ArthurFDLR/whisper-youtube

🔉 Youtube Videos Transcription with OpenAI's Whisper

Language: Jupyter Notebook - Size: 124 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 396 - Forks: 115

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

Language: Kotlin - Size: 3.27 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 77 - Forks: 7

Lingua-Connect/Lingua-Frontend

lingua_Connect is a modern, interactive web application that leverages machine translation to enable real-time multilingual communication between field officers and farmers in rural settings.

Language: TypeScript - Size: 286 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 1

pr0mila/MediBeng-Whisper-Tiny

MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.

Language: Python - Size: 2.23 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 1

roboticslab-uc3m/speech

Text To Speech (TTS) and Automatic Speech Recognition (ASR).

Language: C++ - Size: 71.9 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 5

pr0mila/ParquetToHuggingFace

ParquetToHuggingFace processes raw audio data, converts it into Parquet files, and uploads them to Hugging Face. The README explains how to set up the environment, configure paths, and run the scripts to generate and upload the data.

Language: Python - Size: 2.84 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

dangvansam/viet-asr

VietASR - Vietnamese Automatic Speech Recognition

Language: Python - Size: 289 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 130 - Forks: 54

JarbasAl/pocketsphinx-models-mirror

pocketsphinx models for languages originating from the iberian peninsula

Size: 337 MB - Last synced at: about 1 month ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 4

undertheseanlp/automatic_speech_recognition

Vietnamese Automatic Speech Recognition

Language: Python - Size: 131 MB - Last synced at: 7 days ago - Pushed at: over 6 years ago - Stars: 69 - Forks: 38

double22a/asr_nlp_paper_code

Papers of ASR, Tools of ASR

Size: 655 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 40 - Forks: 9

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 2,842 - Forks: 533

KBNLresearch/videotools Fork of ookgezellig/videotools

A collection of tools to cut, compress, extract, amplify and transcribe (audiotracks of) video files

Language: Python - Size: 6.81 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

ookgezellig/videotools

A collection of tools to cut, compress, extract, amplify and transcribe (audiotracks of) video files

Language: Python - Size: 6.81 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

dragonhub0710/oraguide

Oraguide is an AI voice agent designed to enhance customer interactions through real-time voice communication.

Language: JavaScript - Size: 79.1 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 0

Pleasurecruise/3D-AI-Agent

This project aims to create an AI agent capable of expressing a range of emotions through facial expressions and tone of voice, using Large Language Models (LLMs) and Large Vision Models (LVMs).

Language: C# - Size: 626 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 1

analyticsinmotion/werpy

🐍📦 Ultra-fast Python package for calculating and analyzing the Word Error Rate (WER). Built for the scalable evaluation of speech and transcription accuracy.

Language: Python - Size: 603 KB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 13 - Forks: 4

double22a/speech_dataset

The dataset of Speech Recognition

Size: 74.2 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 413 - Forks: 77

aliyzd95/modified_shemo

A modification on the Sharif Emotional Speech Database

Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 2

0xPD33/sonori

Sonori is a fully local STT app for linux (wayland).

Language: Rust - Size: 106 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

sinaahmadi/CORDI

Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)

Language: Python - Size: 25.9 MB - Last synced at: about 2 months ago - Pushed at: 7 months ago - Stars: 11 - Forks: 2

srinivr/kaldi-long-audio-alignment

Long audio alignment using Kaldi

Language: Shell - Size: 26.4 KB - Last synced at: about 1 month ago - Pushed at: about 4 years ago - Stars: 23 - Forks: 10

FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

Language: Python - Size: 658 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 873 - Forks: 63

PrathuashaKB/ASR-Using-Deep-Learning

Automatic Speech Recognition is a technique that processes human speech into readable text, also known as speech-to-text or transcription systems. Mini-Project I at SSIT: Project cycle closed.

Language: Python - Size: 7.22 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 2

linto-ai/linto-agent

LinTO platform services stack deployment tool for Docker Swarm cluster

Language: JavaScript - Size: 1.01 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 1

ttop32/wav2vec2-live-japanese-translator

real time japanese speech recognition translator using wav2vec2

Language: Jupyter Notebook - Size: 926 KB - Last synced at: 2 months ago - Pushed at: almost 3 years ago - Stars: 38 - Forks: 3

lkmeta/txtify

Web application that converts audio and video to text using AI, supporting various formats and self-hosting.

Language: Python - Size: 6.77 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 80 - Forks: 7

alperensumeroglu/ai-clips-maker

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Language: Python - Size: 2.93 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

saurabhchalke/whisper-meta-quest

Running speech-to-text in a Meta Quest headset using OpenAI's Whisper tiny model

Language: C# - Size: 98.1 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 31 - Forks: 3

github-bowen/TUD-Inclusive-Speech-Technology

Lab assignments of course DSAIT4095 Inclusive Speech Technology (2024/2025) in TU Delft.

Language: Jupyter Notebook - Size: 7.14 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

mozilla-ai/speech-to-text-finetune

Blueprint by Mozilla.ai for finetuning a Speech-To-Text model in your own language

Language: Python - Size: 5.24 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 36 - Forks: 4

CoEDL/elpis

🙊 software for creating speech recognition models.

Language: Python - Size: 82.5 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 159 - Forks: 33

tugstugi/mongolian-speech-recognition

Mongolian speech recognition with PyTorch

Language: Python - Size: 164 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 134 - Forks: 52

mahshid1378/TensorFlowASR

TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language: Python - Size: 4.89 MB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

ZQuang2202/Zipformer_Lightning

An upgrade framework for train and validate compare with icefall using Lightning.

Language: Python - Size: 268 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

ZQuang2202/Zipformer_Triton

A template for serving zipformer on Triton Inference Server.

Language: Python - Size: 1.65 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

jmaczan/asr-dysarthria

Research on Automatic Speech Recognition for dysarthric speech

Language: Jupyter Notebook - Size: 2.64 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 11 - Forks: 2

egorsmkv/w2v2-bert-aligner

Aligner for wav2vec2-bert models

Language: Python - Size: 1.95 KB - Last synced at: about 2 months ago - Pushed at: 6 months ago - Stars: 3 - Forks: 1

ksm26/Serverless-LLM-apps-with-Amazon-Bedrock

The course equips you with the skills to deploy Large Language Model (LLM)-based applications into production using serverless technology with Amazon Bedrock.

Language: Jupyter Notebook - Size: 1.66 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 6 - Forks: 8

bagustris/detect-segment-cough

A python model to detect and segment coughs, forked from coughvid's repo

Language: Jupyter Notebook - Size: 822 KB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 10 - Forks: 3

j3soon/speech-to-windows-input

Perform speech-to-text (STT/ASR) with Azure speech service and simulate keyboard to input the recognized text; Supports English, Chinese, Japanese, and more.

Language: C# - Size: 2.4 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 32 - Forks: 3

khakers/go-subgen

Automatically generate subtitles for your media using whisper.cpp via webhooks with support for Radarr & Sonarr

Language: Go - Size: 7.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 62 - Forks: 1

George0828Zhang/torch_cif

A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.

Language: Python - Size: 167 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 3

matusstas/openai-whisper-microservice

This is an OpenAI Whisper automatic speech recognition microservice

Language: Python - Size: 791 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 22 - Forks: 2

DominikLindorfer/Speech-to-Clipboard

Vibecoding on Windows using Speech-to-Clipboard

Language: Python - Size: 56.6 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

MooersLab/bash-whisper-transcription

Bash function to ease the transcription of audio files with OpenAI's whisper.

Language: Python - Size: 159 KB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 1

Jeronymous/deep_learning_notebooks

Self-containing notebooks to play simply with some particular concepts in Deep Learning

Language: Jupyter Notebook - Size: 17.1 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

a-iceberg/whisper_model_evaluator Fork of format37/vosk_model_evaluator

WER, MER, WIL of Whisper vs Vosk vs Google transcribators comparator

Language: Jupyter Notebook - Size: 1.38 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

winstxnhdw/CapGen

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

Language: Python - Size: 1.09 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 10 - Forks: 1

egorsmkv/cv10-uk-testset-clean

The cleaned Common Voice 10 (test set) that has been checked by a human for Ukrainian 🇺🇦

Size: 409 KB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

PyThaiNLP/pythaiasr

Python Thai Automatic Speech Recognition

Language: Python - Size: 178 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 66 - Forks: 13

lucasgris/wav2vec4bp

Wav2vec resources and models for Brazilian Portuguese

Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: about 2 months ago - Pushed at: almost 3 years ago - Stars: 33 - Forks: 2

sungnyun/ARMHuBERT

(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT

Language: Python - Size: 4.52 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 39 - Forks: 6

egorsmkv/whisper-ukrainian 📦

Trainer and Evaluation scripts for fine-tuning Whisper models for the Ukrainian language

Language: Python - Size: 69.3 KB - Last synced at: 3 months ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 0

kmario23/KenLM-training

Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2

Size: 5.86 KB - Last synced at: 3 months ago - Pushed at: about 6 years ago - Stars: 114 - Forks: 21

joybratasarkar/Text-to-speech-Wav2vec-pretained

A fine-tuned Wav2Vec2-based Automatic Speech Recognition (ASR) system with data augmentation, efficient training, and transcription capabilities. Supports local and Mozilla Common Voice datasets, with evaluation via Word Error Rate (WER). 🚀

Language: Python - Size: 0 Bytes - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Nexdata-AI/128-Hours-English-Australia-Children-Real-world-Casual-Conversation-and-Monologue-speech-dataset

Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Nexdata-AI/203-Hours-German-Financial-Entities-Real-world-Casual-Conversation-and-Monologue-speech-dataset

Size: 1.95 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 0 - Forks: 0

Related Keywords
automatic-speech-recognition 335 speech-recognition 134 asr 130 speech-to-text 117 deep-learning 75 whisper 46 audio 41 machine-learning 41 python 39 speech 38 dataset 31 pytorch 29 stt 25 voice-recognition 23 speech-synthesis 21 natural-language-processing 19 tts 19 text-to-speech 19 asr-model 19 tensorflow 18 deep-neural-networks 18 openai 17 huggingface 16 transcription 16 speech-processing 15 kaldi 14 audio-processing 14 wav2vec2 13 artificial-intelligence 12 transformers 11 wav 10 transformer 10 nlp 9 huggingface-transformers 9 language-model 9 whisper-ai 9 ctc 9 fine-tuning 9 docker 9 ai 8 translation 8 kaldi-asr 8 speech-enhancement 7 python3 7 neural-network 7 librispeech 7 openai-whisper 7 jasper 6 attention-mechanism 6 cnn 6 rnn 6 keras 6 android 6 deepspeech 6 recurrent-neural-networks 5 word-error-rate 5 voice 5 conformer 5 machine-translation 5 vosk 5 ctc-loss 5 large-language-models 5 youtube 5 faster-whisper 5 open-source 5 conversational-ai 5 neural-networks 5 mfcc 5 end-to-end 5 deepspeech2 5 wer 5 quartznet 4 rnn-transducer 4 language-modeling 4 tensorflow2 4 tflite 4 voice-activity-detection 4 pytorch-lightning 4 inference 4 speaker-recognition 4 synthetic-data 4 timit-dataset 4 subtitles 4 speech-translation 4 convolutional-neural-networks 4 data-analysis 4 common-voice 4 nlp-machine-learning 4 data-augmentation 3 engine 3 speech-recognizer 3 generative-adversarial-network 3 voice-assistant 3 onnxruntime 3 lstm 3 hidden-markov-models 3 wav2letter 3 kenlm 3 signal-processing 3 fastapi 3