An open API service providing repository metadata for many open source software ecosystems.

Topic: "automatic-speech-recognition"

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language: Python - Size: 24.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 4,780 - Forks: 1,155

zzw922cn/awesome-speech-recognition-speech-synthesis-papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Size: 197 KB - Last synced at: 8 days ago - Pushed at: almost 2 years ago - Stars: 3,064 - Forks: 514

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced at: 4 months ago - Pushed at: over 2 years ago - Stars: 2,842 - Forks: 533

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

Language: Python - Size: 1.76 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2,693 - Forks: 480

coqui-ai/STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

Language: C++ - Size: 53.4 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 2,503 - Forks: 298

TEN-framework/ten-vad

Voice Activity Detector(VAD) from TEN: low-latency, high-performance and lightweight

Language: C - Size: 9.6 MB - Last synced at: 12 days ago - Pushed at: 28 days ago - Stars: 1,350 - Forks: 114

kakaobrain/pororo 📦

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Language: Python - Size: 12.8 MB - Last synced at: 19 days ago - Pushed at: over 3 years ago - Stars: 1,302 - Forks: 223

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language: Python - Size: 90.3 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 985 - Forks: 240

FireRedTeam/FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics recognition capability.

Language: Python - Size: 658 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 873 - Forks: 63

snakers4/open_stt 📦

Open STT

Language: Python - Size: 87.9 KB - Last synced at: 10 days ago - Pushed at: over 3 years ago - Stars: 801 - Forks: 84

jitsi/jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Language: Python - Size: 1.68 MB - Last synced at: 8 days ago - Pushed at: 7 months ago - Stars: 781 - Forks: 105

EmulationAI/awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

Size: 6.56 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 691 - Forks: 43

shirayu/whispering 📦

Streaming transcriber with whisper

Language: Python - Size: 288 KB - Last synced at: 8 months ago - Pushed at: over 2 years ago - Stars: 686 - Forks: 53

Picovoice/cheetah

On-device streaming speech-to-text engine powered by deep learning

Language: Python - Size: 502 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 635 - Forks: 72

FluidInference/FluidAudio

Native Swift and CoreML SDK for local speaker diarization, VAD, and speech-to-text for real-time workloads. Works on iOS and macOS.

Language: Swift - Size: 13.6 MB - Last synced at: about 18 hours ago - Pushed at: about 18 hours ago - Stars: 604 - Forks: 70

hirofumi0810/neural_sp

End-to-end ASR/LM implementation with PyTorch

Language: Python - Size: 8.66 MB - Last synced at: 4 months ago - Pushed at: about 4 years ago - Stars: 596 - Forks: 139

YoavRamon/awesome-kaldi

This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )

Size: 18.6 KB - Last synced at: 29 days ago - Pushed at: over 3 years ago - Stars: 537 - Forks: 83

jonatasgrosman/huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Language: Python - Size: 598 KB - Last synced at: 3 days ago - Pushed at: almost 2 years ago - Stars: 462 - Forks: 45

Picovoice/leopard

On-device speech-to-text engine powered by deep learning

Language: Python - Size: 419 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 459 - Forks: 29

Z-yq/TensorflowASR

一个执着于让CPU\端侧-Model逼近GPU-Model性能的项目,CPU上的实时率(RTF)小于0.1

Language: Python - Size: 266 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 444 - Forks: 107

double22a/speech_dataset

The dataset of Speech Recognition

Size: 74.2 KB - Last synced at: 4 months ago - Pushed at: 9 months ago - Stars: 413 - Forks: 77

ArthurFDLR/whisper-youtube

🔉 Youtube Videos Transcription with OpenAI's Whisper

Language: Jupyter Notebook - Size: 124 KB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 396 - Forks: 115

leduckhai/MultiMed

[LREC-COLING 2024 (Oral), Interspeech 2024 (Oral), NAACL 2025, ACL 2025] A Series of Multilingual Multitask Medical Speech Processing

Language: Python - Size: 22.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 350 - Forks: 36

hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Language: Python - Size: 4.17 MB - Last synced at: 9 months ago - Pushed at: over 7 years ago - Stars: 313 - Forks: 120

smeetrs/deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Language: Python - Size: 45.9 KB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 232 - Forks: 41

rolczynski/Automatic-Speech-Recognition 📦

🎧 Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

Language: Python - Size: 3.6 MB - Last synced at: 11 days ago - Pushed at: about 5 years ago - Stars: 225 - Forks: 63

NavodPeiris/speechlib

speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names

Language: Python - Size: 33.9 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 224 - Forks: 22

m3hrdadfi/soxan

Wav2Vec for speech recognition, classification, and audio classification

Language: Jupyter Notebook - Size: 3.57 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 197 - Forks: 28

bricewalker/Hey-Jetson

Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.

Language: Jupyter Notebook - Size: 2.88 GB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 192 - Forks: 40

sovaai/sova-asr

SOVA ASR (Automatic Speech Recognition)

Language: Python - Size: 2.32 MB - Last synced at: about 2 months ago - Pushed at: over 2 years ago - Stars: 172 - Forks: 22

vilassn/whisper_android

Offline Speech Recognition with OpenAI Whisper and TensorFlow Lite for Android

Language: C++ - Size: 187 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 164 - Forks: 22

noco-ai/spellbook-docker

AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models

Language: Shell - Size: 2.39 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 163 - Forks: 13

CoEDL/elpis

🙊 software for creating speech recognition models.

Language: Python - Size: 82.5 MB - Last synced at: 4 months ago - Pushed at: over 1 year ago - Stars: 159 - Forks: 33

biodatlab/thonburian-whisper

Thonburian Whisper: Open models for fine-tuned Whisper in Thai. Try our demo on Huggingface space:

Language: Jupyter Notebook - Size: 786 KB - Last synced at: about 2 months ago - Pushed at: 10 months ago - Stars: 144 - Forks: 16

anton-jeran/FAST-RIR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Language: Python - Size: 4.47 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 143 - Forks: 26

ieasybooks/tafrigh

تفريغ النصوص وإنشاء ملفات SRT و VTT باستخدام نماذج Whisper وتقنية wit.ai.

Language: Python - Size: 631 KB - Last synced at: 32 minutes ago - Pushed at: 5 months ago - Stars: 135 - Forks: 18

tugstugi/mongolian-speech-recognition

Mongolian speech recognition with PyTorch

Language: Python - Size: 164 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 134 - Forks: 52

dangvansam/viet-asr

VietASR - Vietnamese Automatic Speech Recognition

Language: Python - Size: 289 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 130 - Forks: 54

at16k/at16k

Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.

Language: Python - Size: 268 KB - Last synced at: about 2 months ago - Pushed at: over 4 years ago - Stars: 129 - Forks: 18

lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

Language: Python - Size: 365 KB - Last synced at: 18 days ago - Pushed at: almost 2 years ago - Stars: 123 - Forks: 12

kmario23/KenLM-training

Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2

Size: 5.86 KB - Last synced at: about 2 months ago - Pushed at: over 6 years ago - Stars: 114 - Forks: 21

andi611/ZeroSpeech-TTS-without-T

A Pytorch implementation for the ZeroSpeech 2019 challenge.

Language: Python - Size: 99.2 MB - Last synced at: 5 months ago - Pushed at: almost 6 years ago - Stars: 112 - Forks: 12

BatuhanYilmaz26/Auto-Subtitled-Video-Generator

Input a YouTube video link or upload a video file and get a video with subtitles.

Language: Python - Size: 122 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 87 - Forks: 36

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

Language: Kotlin - Size: 3.31 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 86 - Forks: 13

mgonzs13/whisper_ros

Speech-to-Text based on SileroVAD + whisper.cpp (GGML Whisper) for ROS 2

Language: C++ - Size: 1.94 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 81 - Forks: 19

LearnedVector/Wav2Letter

Speech Recognition model based off of FAIR research paper built using Pytorch.

Language: Python - Size: 186 KB - Last synced at: over 2 years ago - Pushed at: over 6 years ago - Stars: 81 - Forks: 24

lkmeta/txtify

Web application that converts audio and video to text using AI, supporting various formats and self-hosting.

Language: Python - Size: 6.77 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 80 - Forks: 7

undertheseanlp/automatic_speech_recognition

Vietnamese Automatic Speech Recognition

Language: Python - Size: 131 MB - Last synced at: 2 months ago - Pushed at: over 6 years ago - Stars: 69 - Forks: 38

MingLunHan/CIF-PyTorch

[ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-Fire mechanism).

Language: Python - Size: 106 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 68 - Forks: 6

khakers/go-subgen

Automatically generate subtitles for your media using whisper.cpp via webhooks with support for Radarr & Sonarr

Language: Go - Size: 7.61 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 66 - Forks: 1

PyThaiNLP/pythaiasr

Python Thai Automatic Speech Recognition

Language: Python - Size: 178 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 66 - Forks: 13

hirofumi0810/asr_preprocessing

Python implementation of pre-processing for End-to-End speech recognition

Language: Python - Size: 1.67 MB - Last synced at: over 2 years ago - Pushed at: over 7 years ago - Stars: 66 - Forks: 22

zmeet-ai/asr_demo

语音识别API,分实时语音和长语音离线上传识别,支持中英文等多达100个国家的语言实时转写和同声传译

Language: Java - Size: 23.1 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 63 - Forks: 6

01Zhangbw/Speech-and-audio-papers-Top-Conference

It includes papers on speech&audio field. Now update: ICLR2025-2023, ICML2025-2023, NeurIPS2024-2023, ACMMM2024, AAAI2025-2024, ACL2025-2024, EMNLP2024, NAACL2025, IJCAI2024, ECCV2024

Size: 290 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 61 - Forks: 1

prateekralhan/OpenAI_Whisper_ASR

A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models

Language: Python - Size: 10.7 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 60 - Forks: 15

tsmdt/whisply

💬 Transcribe, translate, diarize, annotate and subtitle video (and audio) with Whisper on Win, Linux and Mac ... fast!

Language: Python - Size: 4.07 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 59 - Forks: 13

googlecreativelab/obvi 📦

A Polymer 3+ webcomponent / button for doing speech recognition

Language: JavaScript - Size: 6.69 MB - Last synced at: 9 days ago - Pushed at: 20 days ago - Stars: 59 - Forks: 16

jonatasgrosman/asrecognition

ASRecognition: just an easy-to-use library for Automatic Speech Recognition.

Language: Python - Size: 106 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 51 - Forks: 5

archiki/Robust-E2E-ASR

This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 2021.

Language: Python - Size: 141 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 46 - Forks: 10

brianlan/automatic-speech-recognition

Automatic Speech Recognition using Tensorflow

Language: Python - Size: 114 KB - Last synced at: over 1 year ago - Pushed at: about 8 years ago - Stars: 46 - Forks: 16

loretoparisi/hf-experiments

Experiments with Hugging Face 🔬 🤗

Language: Python - Size: 20.5 MB - Last synced at: 28 days ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 5

double22a/asr_nlp_paper_code

Papers of ASR, Tools of ASR

Size: 655 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 40 - Forks: 9

sungnyun/ARMHuBERT

(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT

Language: Python - Size: 4.52 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 6

saurabhchalke/whisper-meta-quest

Running speech-to-text in a Meta Quest headset using OpenAI's Whisper tiny model

Language: C# - Size: 98.1 MB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 3

30stomercury/Automatic-Speech-Recognition

End-to-End Speech Recognition Using Tensorflow

Language: Python - Size: 1.93 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 39 - Forks: 8

ttop32/wav2vec2-live-japanese-translator

real time japanese speech recognition translator using wav2vec2

Language: Jupyter Notebook - Size: 926 KB - Last synced at: 4 days ago - Pushed at: about 3 years ago - Stars: 39 - Forks: 3

fabio-sim/Fast-SeamlessM4T-ONNX 📦

ONNX-compatible Fast SeamlessM4T—Massively Multilingual & Multimodal Machine Translation

Language: Python - Size: 371 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 0

pyyush/SpecAugment

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Language: Python - Size: 3.02 MB - Last synced at: over 2 years ago - Pushed at: about 5 years ago - Stars: 37 - Forks: 8

mozilla-ai/speech-to-text-finetune

Blueprint by Mozilla.ai for finetuning a Speech-To-Text model in your own language

Language: Python - Size: 5.24 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 36 - Forks: 4

soheil-mp/Speech-Recognition

End-to-End Speech Recognition using Neural Networks.

Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: 7 days ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 21

George0828Zhang/torch_cif

A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.

Language: Python - Size: 167 KB - Last synced at: 16 days ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 3

lucasgris/wav2vec4bp

Wav2vec resources and models for Brazilian Portuguese

Language: Jupyter Notebook - Size: 1.65 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 33 - Forks: 2

loretoparisi/wave2vec-recognize-docker

Wave2vec 2.0 Recognize pipeline

Language: Python - Size: 33.2 KB - Last synced at: almost 2 years ago - Pushed at: over 4 years ago - Stars: 33 - Forks: 10

j3soon/speech-to-windows-input

Perform speech-to-text (STT/ASR) with Azure speech service and simulate keyboard to input the recognized text; Supports English, Chinese, Japanese, and more.

Language: C# - Size: 2.4 MB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 32 - Forks: 3

sooftware/jasper

PyTorch implementation of "Jasper: An End-to-End Convolutional Neural Acoustic Model" (INTERSPEECH 2019)

Language: Python - Size: 38.1 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 32 - Forks: 2

drumpt/SGEM

Official PyTorch implementation of SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization (INTERSPEECH 2023 Oral Presentation)

Language: Python - Size: 24.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 3

kssteven418/Q-ASR

[ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition

Language: Jupyter Notebook - Size: 41.9 MB - Last synced at: 5 months ago - Pushed at: almost 4 years ago - Stars: 31 - Forks: 2

GAMMA-UMD/TS-RIR

Translating Synthetic RIRs to Real RIRs

Language: Python - Size: 2.24 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 29 - Forks: 7

GAMMA-UMD/IR-GAN

Augmenting Room Impulse Response

Language: MATLAB - Size: 7.36 MB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 29 - Forks: 12

Srijith-rkr/KAUST-Whisper-Adapter

INTERSPEECH 23 - Refunction Whisper to recognize new tasks with adapters!

Language: Python - Size: 5.26 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 28 - Forks: 2

victor369basu/End2EndAutomaticSpeechRecognition

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Language: Python - Size: 4.13 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 28 - Forks: 11

egorsmkv/asr-corpus-creator 📦

This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.

Language: Python - Size: 2.47 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 3

csikasote/BembaSpeech

This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/TV shows transcripts, Youtube Video transcripts, Online sources. The corpus has 14, 438 utterances culminating into over 24 hours of speech.

Size: 2.41 GB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 27 - Forks: 2

Anwarvic/Arabic-Speech-Recognition

This repository contains my attempt to use two famous speech recognition frameworks (Kaldi, CMU Sphinx4) for Arabic Language using the publicly-available dataset "Arabic Corpus of Isolated Words"

Language: Shell - Size: 3.24 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 27 - Forks: 10

oleges1/quartznet-pytorch

Quartznet implementation on pytorch [https://arxiv.org/abs/1910.10261]

Language: Jupyter Notebook - Size: 116 KB - Last synced at: 9 months ago - Pushed at: about 4 years ago - Stars: 26 - Forks: 7

exemplaryai/ai-engine

Easy to use Multi-Provider ASR/Speech To Text and NLP engine

Size: 5.15 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 25 - Forks: 0

Livyatan-melvillei/ai-clips-maker

AI-powered tool to turn long videos into short, viral-ready clips. Combines transcription, speaker diarization, scene detection & 9:16 resizing — perfect for creators & smart automation.

Language: Python - Size: 69.3 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 24 - Forks: 3

pariajm/sharif-emotional-speech-dataset

A large-scale validated database for Persian speech emotion detection.

Size: 13.4 MB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 24 - Forks: 9

gary083/GAN_Harmonized_with_HMMs

Code:Completely Unsupervised Speech Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models

Language: Shell - Size: 8.53 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 24 - Forks: 5

The-Data-Dilemma/MediBeng-Whisper-Tiny

MediBeng Whisper Tiny improves doctor-patient transcription by training the Whisper Tiny model to translate mixed Bengali-English speech into English, making it easier for analysis, record-keeping, and using AI in healthcare.

Language: Python - Size: 2.24 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 23 - Forks: 2

srinivr/kaldi-long-audio-alignment

Long audio alignment using Kaldi

Language: Shell - Size: 26.4 KB - Last synced at: 3 months ago - Pushed at: over 4 years ago - Stars: 23 - Forks: 10

ckaytev/tgisper

Telegram bot with ASR

Language: Python - Size: 125 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 22 - Forks: 3

matusstas/openai-whisper-microservice

This is an OpenAI Whisper automatic speech recognition microservice

Language: Python - Size: 791 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 22 - Forks: 2

bbc/bbc-speech-segmenter

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Language: Shell - Size: 62.6 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 22 - Forks: 2

popcornell/MicRank

MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

Language: Python - Size: 76.2 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 22 - Forks: 4

chimechallenge/chime-utils

Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.

Language: Python - Size: 2.63 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 21 - Forks: 3

Anwarvic/RasaChatbot-with-ASR-and-TTS

This repository contains an attempt to incorporate Rasa Chatbot with state-of-the-art ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models directly without the need of running additional servers or socket connections.

Language: JavaScript - Size: 6.45 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 20 - Forks: 8

stefanpantic/asr

Automatic speech recognition using neural networks

Language: Python - Size: 143 MB - Last synced at: 7 months ago - Pushed at: almost 5 years ago - Stars: 19 - Forks: 1

egorsmkv/whisper-ukrainian 📦

Trainer and Evaluation scripts for fine-tuning Whisper models for the Ukrainian language

Language: Python - Size: 69.3 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 18 - Forks: 0

gheyret/uyghur-asr-ctc

Speech Recognition for Uyghur using deep learning

Language: Python - Size: 6.6 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 18 - Forks: 3

Related Topics
speech-recognition 136 asr 135 speech-to-text 121 deep-learning 76 whisper 49 machine-learning 43 audio 42 python 41 speech 38 pytorch 32 dataset 31 stt 25 voice-recognition 23 text-to-speech 21 speech-synthesis 21 tts 20 asr-model 19 natural-language-processing 19 tensorflow 18 deep-neural-networks 18 openai 17 speech-processing 16 audio-processing 16 huggingface 16 transcription 16 kaldi 14 wav2vec2 13 artificial-intelligence 12 transformers 12 wav 11 transformer 11 nlp 10 translation 9 ctc 9 fine-tuning 9 language-model 9 whisper-ai 9 docker 9 huggingface-transformers 9 ai 9 kaldi-asr 8 openai-whisper 8 speech-enhancement 7 python3 7 neural-network 7 librispeech 7 deepspeech 6 faster-whisper 6 android 6 cnn 6 jasper 6 keras 6 attention-mechanism 6 rnn 6 speech-translation 5 machine-translation 5 ctc-loss 5 deepspeech2 5 voice 5 end-to-end 5 mfcc 5 conversational-ai 5 conformer 5 neural-networks 5 open-source 5 speaker-recognition 5 voice-activity-detection 5 youtube 5 vosk 5 large-language-models 5 recurrent-neural-networks 5 wer 5 word-error-rate 5 language-modeling 4 nlp-machine-learning 4 tensorflow2 4 subtitles-generator 4 real-time 4 common-voice 4 inference 4 quartznet 4 convolutional-neural-networks 4 low-resource-languages 4 speaker-diarization 4 synthetic-data 4 vad 4 data-analysis 4 rnn-transducer 4 pytorch-lightning 4 nvidia 4 timit-dataset 4 tflite 4 subtitles 4 data-science 3 lip-reading 3 sequence-to-sequence 3 attention 3 chinese-speech-recognition 3 ukrainian 3 dnn 3