GitHub topics: audio-visual-speech-recognition
smeetrs/deep_avsr
A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.
Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 232 - Forks: 41

aidayang/FunASR-OneClick
FunASR实时语音识别版,识别麦克风和电脑内播放的声音,电脑语音打字软件
Size: 22.5 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 4 - Forks: 0

modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language: Python - Size: 100 MB - Last synced at: 11 days ago - Pushed at: 14 days ago - Stars: 10,758 - Forks: 1,084

ankurbhatia24/MULTIMODAL-EMOTION-RECOGNITION
Human Emotion Understanding using multimodal dataset.
Language: Jupyter Notebook - Size: 5.72 MB - Last synced at: 3 days ago - Pushed at: almost 5 years ago - Stars: 98 - Forks: 24

sungnyun/cav2vec
(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Language: Python - Size: 3.55 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

zulfiqar-ali01/audio-visual-Transcription
Real-Time Audio-visual Speech Recongition
Language: Python - Size: 33.1 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

umbertocappellazzo/Llama-AVSR
[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".
Language: Python - Size: 20.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sungnyun/avsr-temporal-dynamics
(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition
Language: Python - Size: 7.8 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 9 - Forks: 0

luomingshuang/lipreading_with_icefall
In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-
Language: Python - Size: 2.69 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Sreyan88/LipGER
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Language: Python - Size: 1.16 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 8 - Forks: 1

lzuwei/end-to-end-multiview-lipreading
End to End Multiview Lip Reading
Language: Python - Size: 168 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 2

georgesterpu/Taris
Transformer-based online speech recognition system with TensorFlow 2
Language: Python - Size: 5.4 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 25 - Forks: 6

hmeutzner/kaldi-avsr
Kaldi-based audio-visual speech recognition
Language: Shell - Size: 41 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

Remi-Gau/McGurk_prior_code
Code related to the fMRI experiment on the contextual modulation of the McGurk Effect
Language: MATLAB - Size: 928 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

karlsimsBBC/cassette-bot
🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.
Language: Python - Size: 25 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1
