An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: audio-visual-speech-recognition

smeetrs/deep_avsr

A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.

Language: Python - Size: 45.9 KB - Last synced at: 5 days ago - Pushed at: over 1 year ago - Stars: 232 - Forks: 41

aidayang/FunASR-OneClick

FunASR实时语音识别版,识别麦克风和电脑内播放的声音,电脑语音打字软件

Size: 22.5 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 4 - Forks: 0

modelscope/FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language: Python - Size: 100 MB - Last synced at: 11 days ago - Pushed at: 14 days ago - Stars: 10,758 - Forks: 1,084

ankurbhatia24/MULTIMODAL-EMOTION-RECOGNITION

Human Emotion Understanding using multimodal dataset.

Language: Jupyter Notebook - Size: 5.72 MB - Last synced at: 3 days ago - Pushed at: almost 5 years ago - Stars: 98 - Forks: 24

sungnyun/cav2vec

(ICLR 2025) Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation

Language: Python - Size: 3.55 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 8 - Forks: 0

zulfiqar-ali01/audio-visual-Transcription

Real-Time Audio-visual Speech Recongition

Language: Python - Size: 33.1 MB - Last synced at: 2 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

umbertocappellazzo/Llama-AVSR

[ICASSP 2025] Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners".

Language: Python - Size: 20.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

sungnyun/avsr-temporal-dynamics

(SLT 2024) Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition

Language: Python - Size: 7.8 MB - Last synced at: 2 months ago - Pushed at: 8 months ago - Stars: 9 - Forks: 0

luomingshuang/lipreading_with_icefall

In this repository, I try to use k2, icefall and Lhotse for lip reading. I will modify it for the lip reading task. Many different lip-reading datasets should be added. -_-

Language: Python - Size: 2.69 MB - Last synced at: 4 months ago - Pushed at: about 3 years ago - Stars: 2 - Forks: 0

Sreyan88/LipGER

Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition

Language: Python - Size: 1.16 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 8 - Forks: 1

lzuwei/end-to-end-multiview-lipreading

End to End Multiview Lip Reading

Language: Python - Size: 168 KB - Last synced at: about 2 months ago - Pushed at: over 7 years ago - Stars: 10 - Forks: 2

georgesterpu/Taris

Transformer-based online speech recognition system with TensorFlow 2

Language: Python - Size: 5.4 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 25 - Forks: 6

hmeutzner/kaldi-avsr

Kaldi-based audio-visual speech recognition

Language: Shell - Size: 41 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 6 - Forks: 6

Remi-Gau/McGurk_prior_code

Code related to the fMRI experiment on the contextual modulation of the McGurk Effect

Language: MATLAB - Size: 928 KB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 2

karlsimsBBC/cassette-bot

🤖 📼 Command-line tool for remixing videos with time-coded transcriptions.

Language: Python - Size: 25 MB - Last synced at: over 2 years ago - Pushed at: over 5 years ago - Stars: 5 - Forks: 1