GitHub topics: audio-visual-learning

Repositories

praveena2j/Cross-Attentional-AV-Fusion

FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition

Language: Python - Size: 92.8 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 28 - Forks: 5

praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion

IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"

Language: Python - Size: 290 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 38 - Forks: 11

ali-vilab/dreamtalk

Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

Language: Python - Size: 31.6 MB - Last synced at: 13 days ago - Pushed at: over 1 year ago - Stars: 1,704 - Forks: 206

YapengTian/AVE-ECCV18

Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018

Language: Python - Size: 18.2 MB - Last synced at: 14 days ago - Pushed at: about 4 years ago - Stars: 180 - Forks: 32

Davidlequnchen/LDED-FusionNet

LDED-FusionNet: Machine Learning-Based Audio-Visual Defect Detection for LDED AM Process

Language: Jupyter Notebook - Size: 1.18 GB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 1

ttgeng233/UnAV

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)

Language: Python - Size: 19.9 MB - Last synced at: 13 days ago - Pushed at: about 1 year ago - Stars: 63 - Forks: 6

aiden200/SoundQ2

Sound event localization and detection in 360-degree audio-visual soundscapes.

Language: Python - Size: 175 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

praveena2j/RJCAforSpeakerVerification

[FG 2024] "Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention"

Language: Python - Size: 1 MB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

praveena2j/JointCrossAttentional-AV-Fusion

ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition

Language: Python - Size: 148 KB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 9

OpenNLPLab/AVSBench

[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)

Language: Python - Size: 43.8 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 393 - Forks: 35

aromanusc/SoundQ

Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)

Language: Python - Size: 129 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 2

praveena2j/RecurrentJointAttentionwithLSTMs

ICASSP 2023: "Recursive Joint Attention for Audio-Visual Fusion in Regression Based Emotion Recognition"

Language: Python - Size: 253 KB - Last synced at: 8 days ago - Pushed at: 5 months ago - Stars: 12 - Forks: 0

tanshuai0219/EDTalk

[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation

Language: Python - Size: 46.7 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 374 - Forks: 36

praveena2j/Dynamic-CrossAttention

IEEE ICME : "Cross-Attention is not always needed: Dynamic Cross-Attention for Audio-Visual Dimensional Emotion Recognition"

Language: Python - Size: 2.26 MB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

alvinliu0/HA2G

[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"

Language: Python - Size: 2.61 MB - Last synced at: 5 months ago - Pushed at: about 2 years ago - Stars: 129 - Forks: 9

stoneMo/DeepAVFusion

Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".

Language: Python - Size: 26.4 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 12 - Forks: 0

Huntersxsx/AVVP-Learning-List

Related papers about Weakly-supervised Audio-Visual Video Parsing (AVVP) & Audio-Visual Event Localization (AVE)

Size: 819 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 1 - Forks: 0

roger-tseng/av-superb

A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)

Language: Python - Size: 64.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 4

dkurzend/ClipClap-GZSL

Audio-Visual Generalized Zero-Shot Learning using Large Pre-Trained Models

Language: Python - Size: 27.6 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

kyuyeonpooh/objects-that-sound

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Language: Python - Size: 163 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 4

jasongief/PSP_CVPR_2021

[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line

Language: Python - Size: 1.19 MB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 37 - Forks: 10

stoneMo/CIGN

Official implementation for CIGN

Language: Python - Size: 5.31 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

rhgao/co-separation

Co-Separating Sounds of Visual Objects (ICCV 2019)

Language: Python - Size: 465 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 78 - Forks: 24

jasongief/CPSP

[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line

Language: Python - Size: 498 KB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 3

MengyuanChen21/CVPR2023-CMPAE

[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception

Language: Python - Size: 1.4 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 0

stoneMo/MGN

Official implementation for MGN

Language: Python - Size: 16.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 12 - Forks: 0

stoneMo/EZ-VSL

Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)

Language: Python - Size: 17.5 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 17 - Forks: 2

yanbeic/CCL

PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning

Language: Python - Size: 4.07 MB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 76 - Forks: 11

stoneMo/SLAVC

Official Codebase of "A Closer Look at Weakly-Supervised Audio-Visual Source Localization" (NeurIPS 2022)

Language: Python - Size: 15.6 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 7 - Forks: 1

Tinglok/avstyle

Codebase for the Paper: Learning Visual Styles from Audio-Visual Associations (ECCV 2022, in PyTorch)

Language: Python - Size: 6.59 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 12 - Forks: 0

kvilouras/AV-SSRL

MSc Thesis "Audio-Visual Self-Supervised Representation Learning in-the-wild"

Language: Jupyter Notebook - Size: 4.61 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 0 - Forks: 0

ly-zhu/ly-zhu.github.io

Projects webpage

Language: HTML - Size: 41.1 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 0

Related Keywords

audio-visual-learning 32 multimodal-learning 8 attention-model 6 affective-computing 5 emotion-recognition 5 self-supervised-learning 4 audio-visual-correspondence 4 audio-visual-events 4 machine-learning 3 sound-localization 3 attention 3 computer-vision 3 visual-sound-localization 2 attention-mechanism 2 deep-learning 2 sound-source-localization 2 sound-source-separation 2 audio-visual-parsing 2 audio-visual-video-parsing 2 weakly-supervised-learning 2 representation-learning 2 audio-visual 2 video-generation 2 talking-head 2 face-animation 2 emotion 2 class-incremental-learning 1 continual-learning 1 semantic-segmentation 1 eccv2018 1 cross-modal-retrieval 1 audioset 1 zsl 1 zero-shot-learning 1 learning 1 gzsl 1 generalized-zero-shot-learning 1 clip 1 self-supervision 1 portrait-segmentation 1 audio-visual-applications 1 image-manipulation 1 image-generation 1 generative-adversarial-network 1 gans 1 silence 1 overfitting 1 video-recognition 1 pytorch 1 multi-modal-distillation 1 distillation 1 cvpr2021 1 contrastive-learning 1 compositional-contrastive-learning 1 audio-teacher-models 1 video-understanding 1 cvpr2023 1 sound-separation 1 cross-modality 1 clap 1 audio-visual-seld 1 segmentation-benchmark 1 multi-modal-segmentation 1 audio-visual-segmentation 1 speaker-verification 1 grounded-sam-2 1 audio 1 multi-modal-learning 1 process-monitoring 1 multisensor 1 multimodal-deep-learning 1 laser-directed-energy-deposition 1 defect-detection 1 data-fusion 1 additive-manufacturing 1 eccv-2018 1 ave-dataset 1 audio-visual-generalized-zero-shot-learning 1 audio-visual-event-localization 1 transformer-architecture 1 masked-image-modeling 1 masked-autoencoder 1 cvpr2022 1 co-speech-gesture 1 cross-attention 1 ai 1 talking-face-generation 1 yolov8 1 yolov5 1 sound-detection 1 seldnet 1 seld 1 detic 1 dcase2023 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos