GitHub topics: multimodal-learning
fmenat/MultiviewCropClassification
Public repository of our IGARSS 2023 submission
Language: Python - Size: 132 MB - Last synced at: 29 minutes ago - Pushed at: about 1 hour ago - Stars: 16 - Forks: 1

fmenat/multiviewRS-models
List of deep learning models proposed for remote sensing (RS) multi-view data
Size: 17.6 KB - Last synced at: 32 minutes ago - Pushed at: about 2 hours ago - Stars: 11 - Forks: 0

egeyavuzcan/diffusion-flow-models-research
A comprehensive collection of research papers and resources on diffusion&flow based models, systematically organized by application and architecture. It highlights cutting-edge advances in flow-guided diffusion techniques for image, video, and multimodal generation.
Size: 1.95 KB - Last synced at: about 4 hours ago - Pushed at: about 6 hours ago - Stars: 0 - Forks: 0

friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Language: HTML - Size: 63.3 MB - Last synced at: about 7 hours ago - Pushed at: about 9 hours ago - Stars: 192 - Forks: 19

lll6gg/UI-R1
Code for "UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning"
Language: Python - Size: 1.08 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 91 - Forks: 6

willxxy/awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Size: 1.17 MB - Last synced at: about 5 hours ago - Pushed at: 4 days ago - Stars: 80 - Forks: 2

Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
Language: Python - Size: 903 MB - Last synced at: 2 days ago - Pushed at: almost 2 years ago - Stars: 1,348 - Forks: 150

ytunprovoke/image-optimization-guide
Best practices for image optimization without losing quality. Improve your website speed and performance.
Size: 4.88 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 1 - Forks: 0

pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Language: HTML - Size: 49.9 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 541 - Forks: 80

AdityaLab/MM4TSA
A professional list on Multi-Modalities For Time Series Analysis (MM4TSA) Papers and Resource.
Size: 457 KB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 35 - Forks: 1

haomo-ai/EcoDatum
The official implementation of [Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation] in AAAI2025.
Language: Python - Size: 3.58 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 4 - Forks: 0

pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Size: 459 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 6,421 - Forks: 879

The-Martyr/Awesome-Multimodal-Reasoning
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
Size: 107 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 20 - Forks: 0

JoshD898/caretMultimodal
Multimodal model training in R
Language: R - Size: 2.29 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 0 - Forks: 0

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8 - Forks: 1

kyegomez/NaViT
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: 1 day ago - Pushed at: about 1 month ago - Stars: 230 - Forks: 11

DmitryRyumin/ICASSP-2023-24-Papers
ICASSP 2023-2024 Papers: A complete collection of influential and exciting research papers from the ICASSP 2023-24 conferences. Explore the latest advancements in acoustics, speech and signal processing. Code included. Star the repository to support the advancement of audio and signal processing!
Language: Python - Size: 9.11 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 459 - Forks: 18

MusadiqPasha/Multi-Modal-Meme-Virality-Prediction-using-HGNNs
A multimodal meme virality predictor that integrates image, text, and metadata through ensemble learning and Hypergraph Neural Networks to classify memes as viral or non-viral.
Language: Jupyter Notebook - Size: 13.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 1 - Forks: 0

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 234 KB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 730 - Forks: 65

Xovee/skapp Fork of YifanZhang-git/SKAPP
AAAI '25. Retrieval-Augmented Multimodal Social Media Popularity Prediction
Language: Python - Size: 89.8 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 17 - Forks: 0

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.67 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 9 - Forks: 2

sangminwoo/awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Size: 127 KB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 535 - Forks: 41

SuperBruceJia/Awesome-Mixture-of-Experts
Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)
Size: 438 KB - Last synced at: 10 days ago - Pushed at: 4 months ago - Stars: 27 - Forks: 3

mims-harvard/COMPASS-web
Web Companion for Generalizable AI predicts immunotherapy outcomes across cancers and treatments
Language: Jupyter Notebook - Size: 53 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

mims-harvard/COMPASS
Generalizable AI predicts immunotherapy outcomes across cancers and treatments
Language: Jupyter Notebook - Size: 676 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 2 - Forks: 0

Khamies/comparative-study-brain-registration
Comparative study of classical and deep learning methods for multi-modal brain image registration using the RIRE dataset.
Language: Jupyter Notebook - Size: 23.5 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1 - Forks: 0

thubZ09/All-Things-Multimodal
Hub for researchers exploring VLMs and Multimodal Learning:)
Size: 48 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 26 - Forks: 1

aclai-lab/MultiData.jl
Multimodal datasets for Machine-Learning
Language: Julia - Size: 389 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 3 - Forks: 0

pykale/pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the 🔥PyTorch ecosystem. ⭐ Star to support our work!
Language: Python - Size: 46.3 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 457 - Forks: 66

lucaswychan/quant-lvlm
Easy-to-use large vision language model pipeline for quantitative analysis
Language: Python - Size: 953 KB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

ParitoshParmar/Piano-Skills-Assessment
Piano Skills Assessment [IEEE MMSP 2021]
Language: Python - Size: 854 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 17 - Forks: 2

alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language: Python - Size: 17 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 143 - Forks: 5

olivier-bernard-creatis/olivier-bernard-creatis.github.io Fork of academicpages/academicpages.github.io
Website of Olivier Bernard, Professor at the university of Lyon (INSA) and Deputy Director of the CREATIS research laboratory
Language: Jupyter Notebook - Size: 151 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

ys-zong/awesome-self-supervised-multimodal-learning
[T-PAMI] A curated list of self-supervised multimodal learning resources.
Size: 5.32 MB - Last synced at: 2 days ago - Pushed at: 9 months ago - Stars: 252 - Forks: 8

ZaneBrackley/VIZMed
Thesis Project | Vision-Integrated Zero-Shot Medical AI
Language: Python - Size: 88.9 KB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language: TeX - Size: 268 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 295 - Forks: 11

mims-harvard/AIM2
Artificial Intelligence in Medicine II
Language: HTML - Size: 341 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 3 - Forks: 0

PreferredAI/cornac
A Comparative Framework for Multimodal Recommender Systems
Language: Python - Size: 24.3 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 949 - Forks: 152

ChocoWu/SeTok
Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
Language: Python - Size: 2.1 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 54 - Forks: 0

HenryHZY/Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
Size: 17.6 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 358 - Forks: 16

microsoft/XPretrain
Multi-modality pre-training
Language: Python - Size: 3.59 MB - Last synced at: 4 days ago - Pushed at: about 1 year ago - Stars: 491 - Forks: 37

mbzuai-oryx/Camel-Bench
[NAACL 2025 🔥] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
Language: Python - Size: 14 MB - Last synced at: 10 days ago - Pushed at: 24 days ago - Stars: 31 - Forks: 1

IRVLUTD/Proto-CLIP
Code release for Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning
Language: Python - Size: 69.1 MB - Last synced at: 4 days ago - Pushed at: 4 months ago - Stars: 41 - Forks: 6

IBM/AdaMML 📦
Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.
Language: Python - Size: 113 KB - Last synced at: 5 days ago - Pushed at: about 3 years ago - Stars: 51 - Forks: 9

machine-intelligence-laboratory/TopicNet
Interface for easier topic modelling.
Language: Python - Size: 10.5 MB - Last synced at: 8 days ago - Pushed at: 10 months ago - Stars: 139 - Forks: 17

Hoar012/TDC-Video
Size: 3.05 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 413 - Forks: 34

VectorInstitute/mmlearn
A toolkit for research on multimodal representation learning
Language: Python - Size: 4.92 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 3

Hyeongkeun/LAVCap
Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)
Language: Python - Size: 3.58 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 3 - Forks: 0

JanneHonkonen/ideas
My AI based ideas, designs and whatnot
Size: 22.5 KB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

ChiShengChen/MUSE_EEG
The official implement of Mind's eye: image recognition by EEG via multimodal similarity-keeping contrastive learning.
Language: Python - Size: 20.8 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 30 - Forks: 0

HUANGLIZI/LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Language: Python - Size: 90 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 338 - Forks: 32

pliang279/MFN
[AAAI 2018] Memory Fusion Network for Multi-view Sequential Learning
Language: Python - Size: 56.7 MB - Last synced at: 6 days ago - Pushed at: almost 5 years ago - Stars: 115 - Forks: 30

Haoyu-ha/LNLN
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
Language: Python - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 53 - Forks: 4

amariucaitheodor/acquiring-linguistic-knowledge
Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.
Language: Python - Size: 5.14 MB - Last synced at: 29 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language: Python - Size: 7.36 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 3,882 - Forks: 301

VectorInstitute/shared-encoder
Codebase for the paper titled 'A Shared Encoder Approach to Multimodal Representation Learning'
Language: Python - Size: 141 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 1

praveena2j/Cross-Attentional-AV-Fusion
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition
Language: Python - Size: 92.8 KB - Last synced at: 29 days ago - Pushed at: 5 months ago - Stars: 28 - Forks: 5

praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
Language: Python - Size: 290 KB - Last synced at: 29 days ago - Pushed at: 5 months ago - Stars: 38 - Forks: 11

KaiyangZhou/CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Language: Python - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: 12 months ago - Stars: 1,926 - Forks: 214

willxxy/ECG-Byte
[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
Language: Python - Size: 27.5 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 14 - Forks: 0

MingliangLiang3/GLIP
Centered Masking for Language-Image Pre-training
Language: Jupyter Notebook - Size: 15.9 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

t0gae/AI-Dementia-Diagnosis
AI-Driven Multimodal Dementia Diagnosis: 3D MRI morphometry, and sensor data using cross-modal attention (LSTM + 3D-ResNet + Transformer). Aims to reduce late-stage diagnosis by 60% through early detection.
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language: Python - Size: 4.82 MB - Last synced at: 28 days ago - Pushed at: 7 months ago - Stars: 980 - Forks: 57

aiishwarrya/VisualLanguageModel
A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.
Size: 2.49 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mmaaz60/mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
Language: Python - Size: 34.1 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 308 - Forks: 25

mbaqer/V2X-mmWave-Beamforming
PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.
Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

3dlg-hcvc/tricolo
[WACV 2024] TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Language: Python - Size: 7.17 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 1

declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
Language: Python - Size: 131 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 78 - Forks: 7

pej0918/Prompt-The-Missing
[CVPR 2025 Workshop] Prompt The Missing : Efficient and Robust Audio-Visual Classification under Uncertain Modalities
Language: Python - Size: 3.44 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

Pointcept/GPT4Point
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
Language: Python - Size: 114 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 381 - Forks: 24

ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language: Python - Size: 1.61 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Language: Python - Size: 2.06 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 693 - Forks: 19

aehrc/cxrmate
CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Language: Python - Size: 4.03 MB - Last synced at: 30 days ago - Pushed at: 3 months ago - Stars: 15 - Forks: 3

DmitryRyumin/ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ support visual intelligence development!
Language: Python - Size: 16.8 MB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 954 - Forks: 43

xieh97/language-based-audio-retrieval
List of academic resources on Language-Based Audio Retrieval
Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

henghuiding/MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Language: Python - Size: 52.2 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 521 - Forks: 22

mhw32/multimodal-vae-public
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
Language: Python - Size: 3.9 MB - Last synced at: 21 days ago - Pushed at: over 6 years ago - Stars: 158 - Forks: 36

miccunifi/SEARLE
[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion
Language: Python - Size: 20.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 170 - Forks: 10

TencentARC/ViT-Lens
[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
Language: Python - Size: 132 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 174 - Forks: 10

pliang279/MultiViz
[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models
Language: Python - Size: 790 MB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 96 - Forks: 5

sbelharbi/feature-vs-text-compound-emotion
Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild, ABAW 7th - Challenge - Compound Expression (CE) Recognition Challenge
Language: Python - Size: 1.41 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 0

ksm26/Open-Source-Models-with-Hugging-Face
"Open Source Models with Hugging Face" course empowers you with the skills to leverage open-source models from the Hugging Face Hub for various tasks in NLP, audio, image, and multimodal domains.
Language: Jupyter Notebook - Size: 21 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 19

merveenoyan/siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration 🤗
Language: Jupyter Notebook - Size: 1.66 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 224 - Forks: 12

mims-harvard/Madrigal
Madrigal: Multimodal AI predicts clinical outcomes of drug combinations from preclinical data
Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 20 - Forks: 6

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
Language: Python - Size: 18.6 KB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

Hoar012/RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
Language: Python - Size: 57.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 0

breezedeus/Coin-CLIP
Coin-CLIP: fine-tuned with a vast collection of coin images from CLIP using contrastive learning. It enhances feature extraction for coins, boosting image search accuracy. This model merges Visual Transformer (ViT) with CLIP's multimodal learning, optimized for numismatic applications.
Language: Python - Size: 50.3 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 3

ai4ce/EgoPAT3D
[CVPR 2022] Egocentric Action Target Prediction in 3D
Language: Jupyter Notebook - Size: 93.3 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 3

Jorffy/DAIE
Code for "Dual-Level Adaptive Incongruity-Enhanced Model for Multimodal Sarcasm Detection".
Language: Python - Size: 183 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 20 - Forks: 0

praveena2j/RJCMA
ABAW6 (CVPR-W) We achieved second place in the valence arousal challenge of ABAW6
Language: Python - Size: 171 KB - Last synced at: 29 days ago - Pushed at: 12 months ago - Stars: 18 - Forks: 3

kyegomez/CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Language: Python - Size: 754 KB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 359 - Forks: 18

taco-group/DecAlign
A novel cross-modal decoupling and alignment framework for multimodal representation learning.
Language: JavaScript - Size: 13.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

praveena2j/RJCAforSpeakerVerification
[FG 2024] "Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention"
Language: Python - Size: 1 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

pengfei-luo/multimodal-knowledge-graph
A collection of resources on multimodal knowledge graph, including datasets, papers and contests.
Size: 50.8 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 162 - Forks: 17

snap-research/MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Language: Python - Size: 77.5 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 192 - Forks: 23

zjunlp/HVPNeT
[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
Language: Python - Size: 1.88 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 108 - Forks: 11

OFA-Sys/OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Language: Python - Size: 20.3 MB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 147 - Forks: 13

jyrao/UniSoccer
[CVPR 2025] "Towards Universal Soccer Video Understanding".
Language: Python - Size: 80.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 106 - Forks: 5

kyegomez/AutoRT
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Language: Python - Size: 2.49 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 39 - Forks: 3
