Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: automatic-speech-recognition

chimechallenge/chime-utils

Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.

Language: Python - Size: 2.51 MB - Last synced: about 17 hours ago - Pushed: about 20 hours ago - Stars: 13 - Forks: 2

EricApgar/live-speech-to-text

Live speech to text transcription.

Language: Python - Size: 214 KB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

EmulationAI/awesome-large-audio-models

Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.

Size: 6.54 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 397 - Forks: 26

winstxnhdw/CapGen

A fast CPU-first video/audio transcriber for generating caption files with Whisper and CTranslate2, hosted on Hugging Face Spaces.

Language: Python - Size: 546 KB - Last synced: about 21 hours ago - Pushed: 1 day ago - Stars: 1 - Forks: 0

zzw922cn/awesome-speech-recognition-speech-synthesis-papers

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

Size: 197 KB - Last synced: 1 day ago - Pushed: 7 months ago - Stars: 2,881 - Forks: 506

leduckhai/MultiMed

Multilingual Multitask Multipurpose Medical Speech Recognition

Language: Python - Size: 6.26 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 8 - Forks: 7

TensorSpeech/TensorFlowASR

:zap: TensorFlowASR: Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2. Supported languages that can use characters or subwords

Language: Python - Size: 89.3 MB - Last synced: about 19 hours ago - Pushed: 2 days ago - Stars: 903 - Forks: 244

bricewalker/Hey-Jetson

Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.

Language: Jupyter Notebook - Size: 2.88 GB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 192 - Forks: 40

Srijith-rkr/KAUST-Whisper-Adapter

INTERSPEECH 23 - Refunction Whisper to recognize new tasks with adapters!

Language: Python - Size: 5.26 MB - Last synced: 3 days ago - Pushed: 8 months ago - Stars: 28 - Forks: 2

shirayu/whispering ๐Ÿ“ฆ

Streaming transcriber with whisper

Language: Python - Size: 288 KB - Last synced: 3 days ago - Pushed: about 1 year ago - Stars: 679 - Forks: 57

NavodPeiris/speechlib

speechlib is a library that can do speaker diarization, transcription and speaker recognition on an audio file to create transcripts with actual speaker names

Language: Python - Size: 31.3 MB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 90 - Forks: 5

MooersLab/bash-whisper-transcription

Bash function to ease the transcription of audio files with OpenAI's whisper.

Language: Python - Size: 70.3 KB - Last synced: 5 days ago - Pushed: 5 days ago - Stars: 1 - Forks: 1

jitsi/jiwer

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

Language: Python - Size: 762 KB - Last synced: 3 days ago - Pushed: 10 days ago - Stars: 543 - Forks: 89

matiuste/DistriBlock

[UAI 2024] DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution.

Size: 275 KB - Last synced: 6 days ago - Pushed: 7 days ago - Stars: 0 - Forks: 0

QubitPi/cmusphinx.github.io Fork of cmusphinx/cmusphinx.github.io

CMUSphinx Website

Language: HTML - Size: 17.7 MB - Last synced: 7 days ago - Pushed: 8 days ago - Stars: 0 - Forks: 0

George0828Zhang/torch_cif

A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/abs/1905.11235.

Language: Python - Size: 167 KB - Last synced: 6 days ago - Pushed: 3 months ago - Stars: 29 - Forks: 3

kakaobrain/pororo ๐Ÿ“ฆ

PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

Language: Python - Size: 12.8 MB - Last synced: 5 days ago - Pushed: about 2 years ago - Stars: 1,257 - Forks: 224

YoavRamon/awesome-kaldi

This is a list of features, scripts, blogs and resources for better using Kaldi ( http://kaldi-asr.org/ )

Size: 18.6 KB - Last synced: 3 days ago - Pushed: over 2 years ago - Stars: 531 - Forks: 85

th-schmidt/whisply

Transcribe, diarize, annotate and subtitle audio and video with Whisper ... fast!

Language: Python - Size: 116 KB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 2 - Forks: 1

LD239/WebTranscript

Interactive web tool for automatically โš™๏ธ transcribing and subtitling videos from URL or file uploads in your chosen language. The transcript appears alongside the video player, complete with embedded subtitles.

Language: JavaScript - Size: 3.72 MB - Last synced: 8 days ago - Pushed: 9 days ago - Stars: 1 - Forks: 0

JarbasAl/pocketsphinx-models-mirror

pocketsphinx models for languages originating from the iberian peninsula

Size: 337 MB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 8 - Forks: 4

sungnyun/ARMHuBERT

(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT

Language: Python - Size: 4.51 MB - Last synced: 3 days ago - Pushed: 21 days ago - Stars: 31 - Forks: 4

archiki/ASR-Accent-Analysis

Analysis and investigating the confounding effect of accents in end-to-end Automatic Speech Recognition models.

Language: Jupyter Notebook - Size: 9.84 MB - Last synced: 13 days ago - Pushed: almost 4 years ago - Stars: 14 - Forks: 5

coqui-ai/STT

๐ŸธSTT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.

Language: C++ - Size: 53.4 MB - Last synced: 14 days ago - Pushed: 2 months ago - Stars: 2,144 - Forks: 258

Picovoice/leopard

On-device speech-to-text engine powered by deep learning

Language: Python - Size: 418 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 408 - Forks: 23

prateekralhan/Automatic-End-to-End-Speech-Recognition-using-pytorch

ASR using Pytorch and huggingface transformers

Language: Python - Size: 886 KB - Last synced: 15 days ago - Pushed: almost 2 years ago - Stars: 3 - Forks: 1

ahmetoner/whisper-asr-webservice

OpenAI Whisper ASR Webservice API

Language: Python - Size: 1.28 MB - Last synced: 15 days ago - Pushed: 20 days ago - Stars: 1,652 - Forks: 301

noco-ai/spellbook-docker

AI stack for interacting with LLMs, Stable Diffusion, Whisper, xTTS and many other AI models

Language: Shell - Size: 2.39 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 105 - Forks: 5

jonatasgrosman/huggingsound

HuggingSound: A toolkit for speech-related tasks based on Hugging Face's tools

Language: Python - Size: 598 KB - Last synced: 15 days ago - Pushed: 8 months ago - Stars: 415 - Forks: 42

DevTae/SpeechFeedback

Docker, ์Œ์„ฑ์ธ์‹ AI, FastAPI ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ๋ฐœ์Œ ๊ต์ • ์‹œ์Šคํ…œ

Language: HTML - Size: 5.68 MB - Last synced: 16 days ago - Pushed: 16 days ago - Stars: 7 - Forks: 1

Picovoice/cheetah

On-device streaming speech-to-text engine powered by deep learning

Language: Python - Size: 79.7 MB - Last synced: 14 days ago - Pushed: 14 days ago - Stars: 555 - Forks: 66

PyThaiNLP/pythaiasr

Python Thai Automatic Speech Recognition

Language: Python - Size: 178 KB - Last synced: 14 days ago - Pushed: about 1 year ago - Stars: 51 - Forks: 13

DeepTranscript/deeptranscript-demo

API integration examples

Language: JavaScript - Size: 2.03 MB - Last synced: 19 days ago - Pushed: over 1 year ago - Stars: 2 - Forks: 0

hirofumi0810/tensorflow_end2end_speech_recognition

End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)

Language: Python - Size: 4.17 MB - Last synced: 18 days ago - Pushed: over 6 years ago - Stars: 312 - Forks: 123

ieasybooks/tafrigh

ุชูุฑูŠุบ ุงู„ู…ูˆุงุฏ ุงู„ู…ุฑุฆูŠุฉ ุฃูˆ ุงู„ู…ุณู…ูˆุนุฉ ุฅู„ู‰ ู†ุตูˆุต

Language: Python - Size: 133 KB - Last synced: 14 days ago - Pushed: 4 months ago - Stars: 83 - Forks: 9

fabio-sim/Fast-SeamlessM4T-ONNX ๐Ÿ“ฆ

ONNX-compatible Fast SeamlessM4Tโ€”Massively Multilingual & Multimodal Machine Translation

Language: Python - Size: 371 KB - Last synced: 19 days ago - Pushed: 9 months ago - Stars: 37 - Forks: 0

PeterGilles/Speech-Recognition-Lecture---Data-Science-in-Humanities

Material for my lecture on Automatic Speech Recognition

Language: Jupyter Notebook - Size: 26.4 MB - Last synced: 21 days ago - Pushed: 22 days ago - Stars: 0 - Forks: 0

lexust1/av2txtsum

Automatic speech recognition (ASR)

Language: HTML - Size: 694 KB - Last synced: 23 days ago - Pushed: 24 days ago - Stars: 0 - Forks: 0

ogunlao/asr_stat_significance

Performs statistical significance test between two ASR models using bootstrap or blockwise bootstrap sampling.

Language: Python - Size: 22.5 KB - Last synced: 3 days ago - Pushed: 7 months ago - Stars: 3 - Forks: 0

at16k/at16k

Trained models for automatic speech recognition (ASR). A library to quickly build applications that require speech to text conversion.

Language: Python - Size: 268 KB - Last synced: 26 days ago - Pushed: about 3 years ago - Stars: 130 - Forks: 19

Nexdata-AI/1796-Hours-German-Speech-Data-by-Mobile-Phone

German Speech Dataset

Size: 457 KB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 1 - Forks: 0

Nexdata-AI/201-Hours-North-American-English-Speech-Data-by-Mobile-Phone-and-PC

North American English Speech Dataset

Size: 3.91 KB - Last synced: 28 days ago - Pushed: 28 days ago - Stars: 1 - Forks: 0

Nexdata-AI/20.1-Hours-Chinese-Mandarin-Synthesis-Corpus-Male-Customer-Service

Chinese Mandarin Synthesis Corpus-Male

Size: 3.91 KB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 1 - Forks: 0

Nexdata-AI/26.1-Hours-Chinese-Mandarin-Synthesis-Corpus-Female-Customer-Service

Chinese Mandarin Synthesis Corpus-Customer Sevice

Size: 3.91 KB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 1 - Forks: 0

archiki/Robust-E2E-ASR

This repository contains the code for our upcoming paper An Investigation of End-to-End Models for Robust Speech Recognition at ICASSP 2021.

Language: Python - Size: 136 KB - Last synced: 13 days ago - Pushed: about 3 years ago - Stars: 44 - Forks: 10

undertheseanlp/automatic_speech_recognition

Vietnamese Automatic Speech Recognition

Language: Python - Size: 131 MB - Last synced: 30 days ago - Pushed: over 5 years ago - Stars: 61 - Forks: 37

mathusanm6/Amaze-Voice-Lab

The goal of this research project is to be able to control the movements of characters in a Maze game using real-time voice commands such as saying out loud Up, Down, Left or Right.

Language: Java - Size: 65.8 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

abus-aikorea/studio-free

youtube download, vocal remover, vocal extraction, karaoke video production, STT, automatic speech recognition, transcription, automatic subtitle, AI, yt-dlp, demucs, whisper, webui, gradio, windows

Language: Python - Size: 8.98 MB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 8 - Forks: 0

sovaai/sova-asr

SOVA ASR (Automatic Speech Recognition)

Language: Python - Size: 2.32 MB - Last synced: 19 days ago - Pushed: about 1 year ago - Stars: 167 - Forks: 19

lucasnewman/best-rq-pytorch

Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.

Language: Python - Size: 365 KB - Last synced: 13 days ago - Pushed: 8 months ago - Stars: 63 - Forks: 6

CoEDL/elpis

๐Ÿ™Š software for creating speech recognition models.

Language: Python - Size: 82.5 MB - Last synced: 28 days ago - Pushed: 8 months ago - Stars: 150 - Forks: 30

ArthurFDLR/whisper-youtube

๐Ÿ”‰ Youtube Videos Transcription with OpenAI's Whisper

Language: Jupyter Notebook - Size: 124 KB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 312 - Forks: 101

snakers4/open_stt ๐Ÿ“ฆ

Open STT

Language: Python - Size: 87.9 KB - Last synced: about 1 month ago - Pushed: about 2 years ago - Stars: 763 - Forks: 80

ECNU-Cross-Innovation-Lab/ENT

[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition

Language: Python - Size: 638 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 7 - Forks: 0

jmaczan/asr-dysarthria

๐Ÿ˜บ Research on Automatic Speech Recognition for dysarthric speech

Language: Jupyter Notebook - Size: 725 KB - Last synced: 23 days ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0

inferless/Distil-whisper-large-v2

Distil-Whisper is a distilled version of the Whisper model that is 6 times faster, 49% smaller, and performs within 1% WER on out-of-distribution evaluation sets. This is the repository for distil-large-v2, a distilled variant of Whisper large-v2.

Language: Python - Size: 10.7 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

kmario23/KenLM-training

Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2

Size: 5.86 KB - Last synced: 24 days ago - Pushed: almost 5 years ago - Stars: 110 - Forks: 21

koudounasalkis/Divergences-in-Apollo-Missions

This repo contains the code for "Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models"

Language: Jupyter Notebook - Size: 602 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

analyticsinmotion/werpy

๐Ÿ๐Ÿ“ฆ Rapidly calculate and analyze the Word Error Rate (WER) with this powerful yet lightweight Python package.

Language: Python - Size: 415 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 9 - Forks: 2

MLLP-Research-Group/Europarl-ASR

A 1300-hour English speech and text corpus of parliamentary debates for streaming ASR training and benchmarking, speech data filtering and speech data verbatimization.

Size: 41 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 2 - Forks: 0

roboticslab-uc3m/speech

Text To Speech (TTS) and Automatic Speech Recognition (ASR).

Language: Python - Size: 71.8 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 3 - Forks: 4

Darveivoldavara/whisper_model_evaluator Fork of format37/vosk_model_evaluator

WER, MER, WIL of Whisper vs Vosk vs Google transcribators comparator

Size: 1.38 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

inferless/Distil-whisper-large-v3

Distil-Whisper: distil-large-v3 is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date.

Language: Python - Size: 12.7 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

PatrickTourniaire/ASR-Exam-Revision

ASR course past paper revision work for the University of Edinburgh

Language: TeX - Size: 6.99 MB - Last synced: about 2 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

zhaoyu611/Automatic_Speech_Recognition_with_Multi_Models

A Simple Automatic Speech Recognition (ASR) Model in Tensorflow, which only needs to focus on Deep Neural Network. It's easy to test popular cells (most are LSTM and its variants) and models (unidirectioanl RNN, bidirectional RNN, ResNet and so on). Moreover, you are welcome to play with self-defined cells or models.

Language: Python - Size: 981 MB - Last synced: about 2 months ago - Pushed: over 6 years ago - Stars: 18 - Forks: 8

biodatlab/thonburian-whisper

Thonburian Whisper: Open models for fine-tuned Whisper in Thai. Try our demo on Huggingface space:

Language: Jupyter Notebook - Size: 333 KB - Last synced: about 2 months ago - Pushed: 3 months ago - Stars: 63 - Forks: 4

wenet-e2e/wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit

Language: Python - Size: 22.7 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 3,606 - Forks: 993

zzw922cn/Automatic_Speech_Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Language: Python - Size: 5.53 MB - Last synced: 2 months ago - Pushed: about 1 year ago - Stars: 2,835 - Forks: 539

Anwarvic/RasaChatbot-with-ASR-and-TTS

This repository contains an attempt to incorporate Rasa Chatbot with state-of-the-art ASR (Automatic Speech Recognition) and TTS (Text-to-Speech) models directly without the need of running additional servers or socket connections.

Language: JavaScript - Size: 6.45 MB - Last synced: 15 days ago - Pushed: over 4 years ago - Stars: 20 - Forks: 8

goodmike31/pl-asr-speech-data-survey

Survey of available speech datasets for Polish ASR development

Language: Python - Size: 263 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 4 - Forks: 0

zahraDehghanian97/ASR_Journey

In this repo. we wil use and test different model to automatically transcribe voice(Persian) to text. This task is also called Automatic Speech Recognition (ASR) for Persian language.

Language: Jupyter Notebook - Size: 1.86 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 1

Darveivoldavara/whisper-timestamped

Timestamped ASR microservice

Language: Jupyter Notebook - Size: 3.6 MB - Last synced: 25 days ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

sinaahmadi/CORDI

Language and Speech Technology for Central Kurdish Varieties (LREC-COLING 2024)

Language: Python - Size: 25.9 MB - Last synced: 15 days ago - Pushed: about 2 months ago - Stars: 8 - Forks: 1

mydroidandi/commbase-stt-whisper-reactive-p

A reactive version of STT engine with remote for Commbase

Language: Python - Size: 370 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 2 - Forks: 0

Anwarvic/Arabic-Speech-Recognition

This repository contains my attempt to use two famous speech recognition frameworks (Kaldi, CMU Sphinx4) for Arabic Language using the publicly-available dataset "Arabic Corpus of Isolated Words"

Language: Shell - Size: 3.24 MB - Last synced: 15 days ago - Pushed: over 4 years ago - Stars: 27 - Forks: 10

double22a/speech_dataset

The dataset of Speech Recognition

Size: 62.5 KB - Last synced: 2 months ago - Pushed: about 1 year ago - Stars: 333 - Forks: 66

mydroidandi/commbase-stt-whisper-proactive-p

A proactive version of STT engine for Commbase

Language: Python - Size: 373 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

ksquarekumar/whisper-stream

Whisper Transcription Service

Language: Jupyter Notebook - Size: 6.21 MB - Last synced: 2 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

bbc/bbc-speech-segmenter

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Language: Shell - Size: 62.6 MB - Last synced: about 1 month ago - Pushed: almost 2 years ago - Stars: 22 - Forks: 2

ksm26/Serverless-LLM-apps-with-Amazon-Bedrock

The course equips you with the skills to deploy Large Language Model (LLM)-based applications into production using serverless technology with Amazon Bedrock.

Language: Jupyter Notebook - Size: 1.66 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 2

sooftware/jasper

PyTorch implementation of "Jasper: An End-to-End Convolutional Neural Acoustic Model" (INTERSPEECH 2019)

Language: Python - Size: 38.1 KB - Last synced: 15 days ago - Pushed: about 3 years ago - Stars: 30 - Forks: 2

hirofumi0810/neural_sp

End-to-end ASR/LM implementation with PyTorch

Language: Python - Size: 8.66 MB - Last synced: 2 months ago - Pushed: over 2 years ago - Stars: 582 - Forks: 138

prateekralhan/OpenAI_Whisper_ASR

A minimalistic automatic speech recognition streamlit based webapp powered by OpenAI's Whisper "State of the Art" models

Language: Python - Size: 10.7 MB - Last synced: 15 days ago - Pushed: over 1 year ago - Stars: 60 - Forks: 15

csikasote/BembaSpeech

This is an ASR corpus for Bemba language. It contains read speech from diverse publicly available Bemba sources; Literature Books, Radio/TV shows transcripts, Youtube Video transcripts, Online sources. The corpus has 14, 438 utterances culminating into over 24 hours of speech.

Size: 2.41 GB - Last synced: 3 months ago - Pushed: about 1 year ago - Stars: 27 - Forks: 2

egorsmkv/asr-corpus-creator ๐Ÿ“ฆ

This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.

Language: Python - Size: 2.47 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 27 - Forks: 3

smeetrs/deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Language: Python - Size: 42 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 169 - Forks: 39

googlecreativelab/obvi

A Polymer 3+ webcomponent / button for doing speech recognition

Language: JavaScript - Size: 6.6 MB - Last synced: 17 days ago - Pushed: 4 months ago - Stars: 57 - Forks: 16

OVOSHatchery/ovos-stt-plugin-pocketsphinx

pocketsphinx STT plugin for mycroft

Language: Python - Size: 25.4 KB - Last synced: 3 days ago - Pushed: 3 months ago - Stars: 1 - Forks: 1

krylm/whisper-event-tuning

Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.

Language: Python - Size: 9.77 KB - Last synced: 18 days ago - Pushed: over 1 year ago - Stars: 12 - Forks: 2

j3soon/whisper-to-input

An Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages.

Language: Kotlin - Size: 3.05 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 4 - Forks: 0

victor369basu/End2EndAutomaticSpeechRecognition

In this repository, I have developed an end to end Automatic speech recognition project. I have developed the neural network model for automatic speech recognition with PyTorch and used MLflow to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.

Language: Python - Size: 4.13 MB - Last synced: 15 days ago - Pushed: almost 3 years ago - Stars: 28 - Forks: 11

gopiashokan/Voice-AI-Automatic-Speech-Recognition

Developed a Marathi speech-to-text application using the Hugging Face whisper ASR models. Trained the model with a custom audio dataset and fine-tuned it for optimized performance. Deployed the model on the Hugging Face Model Hub, achieving a WER of 0.74 for the base model.

Language: Jupyter Notebook - Size: 2.02 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

linto-ai/linto-agent

LinTO platform services stack deployment tool for Docker Swarm cluster

Language: JavaScript - Size: 1.01 MB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 14 - Forks: 1

rolczynski/Automatic-Speech-Recognition ๐Ÿ“ฆ

๐ŸŽง Automatic Speech Recognition: DeepSpeech & Seq2Seq (TensorFlow)

Language: Python - Size: 3.6 MB - Last synced: 29 days ago - Pushed: almost 4 years ago - Stars: 222 - Forks: 66

mydroidandi/commbase-stt-vosk-p ๐Ÿ“ฆ

An ASR (Automatic Speech Recognition) engine. It is capable of converting spoken language into written text without requiring an internet connection, making it a reliable and secure solution for any application that needs speech-to-text functionality

Language: Python - Size: 3.85 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1 - Forks: 0

anondo1969/deep-speech-vis

(Master's Thesis) Alam, Mahbub Ul, From Speech to Image: A Novel Approach to Understand the Hidden Layer Mechanisms of Deep Neural Networks in Automatic Speech Recognition, Masterarbeit, Institut fรผr Maschinelle Sprachverarbeitung, Universitรคt Stuttgart, 2017. (https://www.ims.uni-stuttgart.de/en/research/publications/theses/)

Language: Python - Size: 63.5 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

matusstas/openai-whisper-microservice

This is an OpenAI Whisper automatic speech recognition microservice

Language: Python - Size: 786 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 13 - Forks: 1

bytewife/automatic-speech-recognition-guide

Information on the ASR process, but the model needs your help!

Language: Jupyter Notebook - Size: 1.42 MB - Last synced: 4 months ago - Pushed: almost 3 years ago - Stars: 0 - Forks: 0

ckaytev/tgisper

Telegram bot with ASR

Language: Python - Size: 71.3 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 18 - Forks: 1

anton-jeran/FAST-RIR

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Language: Python - Size: 4.47 MB - Last synced: 4 months ago - Pushed: 6 months ago - Stars: 128 - Forks: 21

Related Keywords
automatic-speech-recognition 275 speech-recognition 110 asr 105 speech-to-text 98 deep-learning 68 audio 37 machine-learning 36 whisper 34 speech 31 python 29 dataset 29 pytorch 27 stt 24 voice-recognition 19 speech-synthesis 19 asr-model 17 tts 17 deep-neural-networks 16 tensorflow 16 openai 15 text-to-speech 15 transcription 13 natural-language-processing 12 wav2vec2 12 kaldi 12 speech-processing 12 huggingface 11 wav 10 audio-processing 10 kaldi-asr 8 transformer 8 neural-network 8 ctc 7 translation 7 nlp 7 huggingface-transformers 7 librispeech 7 transformers 7 whisper-ai 6 language-model 6 android 6 attention-mechanism 6 docker 6 deepspeech 6 keras 6 artificial-intelligence 5 neural-networks 5 jasper 5 fine-tuning 5 python3 5 rnn 5 cnn 5 ctc-loss 5 speech-enhancement 5 youtube 5 mfcc 5 vosk 5 voice 4 end-to-end 4 openai-whisper 4 ai 4 conversational-ai 4 subtitles 4 speaker-recognition 4 word-error-rate 4 wer 4 deepspeech2 4 quartznet 4 timit-dataset 4 tflite 3 conformer 3 pytorch-lightning 3 kenlm 3 deep-speech 3 streamlit 3 music-information-retrieval 3 end-to-end-learning 3 speech-emotion-recognition 3 speech-translation 3 generative-adversarial-network 3 speech-recognizer 3 room-impulse-response 3 vietnamese-nlp 3 engine 3 commbase 3 synthetic-data 3 wav2letter 3 tensorflow2 3 signal-processing 3 lstm 3 chinese-speech-recognition 3 voice-assistant 3 lip-reading 3 timit 3 low-resource-languages 3 rnn-transducer 3 pocketsphinx 3 recurrent-neural-networks 3 subtitles-generator 3 sentiment-analysis 3