GitHub topics: audio-visual-speech-recognition

Repositories

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language: Python - Size: 100 MB - Last synced at: 2 days ago - Pushed at: 21 days ago - Stars: 11,014 - Forks: 1,110

smeetrs/deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Language: Python - Size: 45.9 KB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 232 - Forks: 41

aidayang/FunASR-OneClick

FunASR实时语音识别版，识别麦克风和电脑内播放的声音，电脑语音打字软件

Size: 22.5 KB - Last synced at: 16 days ago - Pushed at: 17 days ago - Stars: 4 - Forks: 0

ankurbhatia24/MULTIMODAL-EMOTION-RECOGNITION

Human Emotion Understanding using multimodal dataset.

Language: Jupyter Notebook - Size: 5.72 MB - Last synced at: about 21 hours ago - Pushed at: almost 5 years ago - Stars: 98 - Forks: 24

sungnyun/cav2vec

(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

Language: Python - Size: 3.55 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

zulfiqar-ali01/audio-visual-Transcription

Real-Time Audio-visual Speech Recongition

Language: Python - Size: 33.1 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

umbertocappellazzo/Llama-AVSR

[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".

Language: Python - Size: 20.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sungnyun/avsr-temporal-dynamics

(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

Language: Python - Size: 7.8 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 9 - Forks: 0

luomingshuang/lipreading_with_icefall

In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-

Language: Python - Size: 2.69 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Sreyan88/LipGER

Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Language: Python - Size: 1.16 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 8 - Forks: 1

lzuwei/end-to-end-multiview-lipreading

End to End Multiview Lip Reading

Language: Python - Size: 168 KB - Last synced at: 2 months ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 2

georgesterpu/Taris

Transformer-based online speech recognition system with TensorFlow 2

Language: Python - Size: 5.4 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 25 - Forks: 6

hmeutzner/kaldi-avsr

Kaldi-based audio-visual speech recognition

Language: Shell - Size: 41 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

Remi-Gau/McGurk_prior_code

Code related to the fMRI experiment on the contextual modulation of the McGurk Effect

Language: MATLAB - Size: 928 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

karlsimsBBC/cassette-bot

🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.

Language: Python - Size: 25 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1

Related Keywords

audio-visual-speech-recognition 15 speech-recognition 6 deep-learning 3 audio-visual 3 visual-speech-recognition 3 tensorflow 2 python 2 lip-reading 2 whisper 2 voice-activity-detection 2 vad 2 speechllm 2 speechgpt 2 speaker-diarization 2 rnnt 2 pytorch 2 dfsmn 2 punctuation 2 conformer 2 paraformer 2 taris 1 speech-recognizer 1 online 1 multimodal-deep-learning 1 multimodal 1 mahcine-learning 1 live-caption 1 end-to-end-learning 1 prompting 1 automatic-speech-recognition 1 tensorflow2 1 transformer 1 asr 1 avsr 1 deep-neural-networks 1 kaldi 1 fmri 1 fmri-data-analysis 1 multisensory-integration 1 text-to-video 1 video 1 speech-to-text 1 funasr 1 pretrained-models 1 pretrained-model 1 audio-visualization 1 deeplearning 1 keras 1 librosa 1 machine-learning 1 multimodal-emotion-recognition 1 opensmile 1 noise-robustness 1 self-supervised-learning 1 audio-processing 1 realtime-analytics 1 large-language-models 1 icefall 1 k2 1 generative-ai 1 llm 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos