GitHub topics: multimodal-learning
thubZ09/All-Things-Multimodal
Hub for researchers exploring VLMs and Multimodal Learning:)
Size: 47.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 25 - Forks: 1

ChocoWu/SeTok
Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
Language: Python - Size: 2.1 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 54 - Forks: 0

HenryHZY/Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
Size: 17.6 KB - Last synced at: about 23 hours ago - Pushed at: over 1 year ago - Stars: 358 - Forks: 16

JoshD898/caretMultimodal
Multimodal model training in R
Language: R - Size: 2.28 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

SuperBruceJia/Awesome-Mixture-of-Experts
Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)
Size: 438 KB - Last synced at: 2 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 3

AdityaLab/MM4TSA
A professional list on Multi-Modalities For Time Series Analysis (MM4TSA) Papers and Resource.
Size: 457 KB - Last synced at: 1 day ago - Pushed at: 18 days ago - Stars: 27 - Forks: 0

microsoft/XPretrain
Multi-modality pre-training
Language: Python - Size: 3.59 MB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 491 - Forks: 37

ytunprovoke/image-optimization-guide
Best practices for image optimization without losing quality. Improve your website speed and performance.
Size: 4.88 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

mbzuai-oryx/Camel-Bench
[NAACL 2025 π₯] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
Language: Python - Size: 14 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 31 - Forks: 1

Eurus-Holmes/Awesome-Multimodal-Research
A curated list of Multimodal Related Research.
Language: Python - Size: 903 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 1,346 - Forks: 149

The-Martyr/Awesome-Multimodal-Reasoning
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
Size: 60.5 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 19 - Forks: 0

Hoar012/TDC-Video
Size: 3.05 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 0 - Forks: 0

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 413 - Forks: 34

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.68 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 7 - Forks: 1

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow β Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8 - Forks: 1

PreferredAI/cornac
A Comparative Framework for Multimodal Recommender Systems
Language: Python - Size: 24.2 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 945 - Forks: 153

Hyeongkeun/LAVCap
Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)
Language: Python - Size: 3.58 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 3 - Forks: 0

ys-zong/awesome-self-supervised-multimodal-learning
[T-PAMI] A curated list of self-supervised multimodal learning resources.
Size: 5.32 MB - Last synced at: 7 days ago - Pushed at: 8 months ago - Stars: 251 - Forks: 8

JanneHonkonen/ideas
My AI based ideas, designs and whatnot
Size: 22.5 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 0 - Forks: 0

ChiShengChen/MUSE_EEG
The official implement of Mind's eye: image recognition by EEG via multimodal similarity-keeping contrastive learning.
Language: Python - Size: 20.8 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 30 - Forks: 0

HUANGLIZI/LViT
[IEEE Transactions on Medical Imaging/TMI] This repo is the official implementation of "LViT: Language meets Vision Transformer in Medical Image Segmentation"
Language: Python - Size: 90 MB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 338 - Forks: 32

mims-harvard/AIM2
Artificial Intelligence in Medicine II
Language: HTML - Size: 336 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

pliang279/awesome-multimodal-ml
Reading list for research topics in multimodal machine learning
Size: 459 KB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 6,381 - Forks: 875

willxxy/awesome-mmps
Corpus of resources for multimodal machine learning with physiological signals (mmps).
Size: 1.03 MB - Last synced at: 10 days ago - Pushed at: 17 days ago - Stars: 70 - Forks: 2

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 234 KB - Last synced at: 12 days ago - Pushed at: 19 days ago - Stars: 714 - Forks: 68

Haoyu-ha/LNLN
Towards Robust Multimodal Sentiment Analysis with Incomplete Data
Language: Python - Size: 29.3 KB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 53 - Forks: 4

amariucaitheodor/acquiring-linguistic-knowledge
Master's thesis of Theodor Amariucai, supervised by Alexander Warstadt and Prof. Ryan Cotterell.
Language: Python - Size: 5.14 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 1

mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
Language: Python - Size: 7.36 MB - Last synced at: 12 days ago - Pushed at: 8 months ago - Stars: 3,882 - Forks: 301

VectorInstitute/shared-encoder
Codebase for the paper titled 'A Shared Encoder Approach to Multimodal Representation Learning'
Language: Python - Size: 141 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 7 - Forks: 1

ilaria-manco/multimodal-ml-music
List of academic resources on Multimodal ML for Music
Language: TeX - Size: 268 KB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 293 - Forks: 11

praveena2j/Cross-Attentional-AV-Fusion
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition
Language: Python - Size: 92.8 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 28 - Forks: 5

praveena2j/Joint-Cross-Attention-for-Audio-Visual-Fusion
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
Language: Python - Size: 290 KB - Last synced at: 9 days ago - Pushed at: 5 months ago - Stars: 38 - Forks: 11

VectorInstitute/mmlearn
A toolkit for research on multimodal representation learning
Language: Python - Size: 4.91 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 14 - Forks: 3

kyegomez/NaViT
My implementation of "Patch nβ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Language: Python - Size: 210 KB - Last synced at: 13 days ago - Pushed at: 17 days ago - Stars: 226 - Forks: 10

KaiyangZhou/CoOp
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
Language: Python - Size: 1.38 MB - Last synced at: 14 days ago - Pushed at: 11 months ago - Stars: 1,926 - Forks: 214

willxxy/ECG-Byte
[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
Language: Python - Size: 27.5 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 14 - Forks: 0

MingliangLiang3/GLIP
Centered Masking for Language-Image Pre-training
Language: Jupyter Notebook - Size: 15.9 MB - Last synced at: 8 days ago - Pushed at: 15 days ago - Stars: 1 - Forks: 0

t0gae/AI-Dementia-Diagnosis
AI-Driven Multimodal Dementia Diagnosis: 3D MRI morphometry, and sensor data using cross-modal attention (LSTM + 3D-ResNet + Transformer). Aims to reduce late-stage diagnosis by 60% through early detection.
Language: Jupyter Notebook - Size: 13.7 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 0 - Forks: 0

sangminwoo/awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
Size: 127 KB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 532 - Forks: 41

friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Language: HTML - Size: 63.2 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 172 - Forks: 16

AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language: Python - Size: 4.82 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 980 - Forks: 57

aiishwarrya/VisualLanguageModel
A custom Vision-Language Model (VLM) built from scratch, using SigLip for contrastive learning and a ViT-based encoder to generate meaningful image captions and semantic descriptions.
Size: 2.49 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

mmaaz60/mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
Language: Python - Size: 34.1 MB - Last synced at: 14 days ago - Pushed at: almost 2 years ago - Stars: 308 - Forks: 25

mbaqer/V2X-mmWave-Beamforming
PyTorch implementation of multi-modality sensing in 60 GHz mmWave beamforming for connected vehicles.
Language: Jupyter Notebook - Size: 5.09 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

3dlg-hcvc/tricolo
[WACV 2024] TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Language: Python - Size: 7.17 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 1

pliang279/MFN
[AAAI 2018] Memory Fusion Network for Multi-view Sequential Learning
Language: Python - Size: 56.7 MB - Last synced at: 12 days ago - Pushed at: over 4 years ago - Stars: 114 - Forks: 30

declare-lab/LLM-PuzzleTest
This repository is maintained to release dataset and models for multimodal puzzle reasoning.
Language: Python - Size: 131 MB - Last synced at: 13 days ago - Pushed at: about 2 months ago - Stars: 78 - Forks: 7

pej0918/Prompt-The-Missing
[CVPR 2025 Workshop] Prompt The Missing : Efficient and Robust Audio-Visual Classification under Uncertain Modalities
Language: Python - Size: 3.44 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 1 - Forks: 0

Pointcept/GPT4Point
[CVPR'24 Highlight] GPT4Point: A Unified Framework for Point-Language Understanding and Generation.
Language: Python - Size: 114 MB - Last synced at: 16 days ago - Pushed at: 12 months ago - Stars: 381 - Forks: 24

ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language: Python - Size: 1.61 MB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Language: Python - Size: 2.06 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 693 - Forks: 19

aehrc/cxrmate
CXRMate: Longitudinal Data and a Semantic Similarity Reward for Chest X-Ray Report Generation
Language: Python - Size: 4.03 MB - Last synced at: 9 days ago - Pushed at: 2 months ago - Stars: 15 - Forks: 3

DmitryRyumin/ICCV-2023-Papers
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. β support visual intelligence development!
Language: Python - Size: 16.8 MB - Last synced at: 13 days ago - Pushed at: 8 months ago - Stars: 954 - Forks: 43

xieh97/language-based-audio-retrieval
List of academic resources on Language-Based Audio Retrieval
Size: 7.81 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

henghuiding/MeViS
[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions
Language: Python - Size: 52.2 MB - Last synced at: 16 days ago - Pushed at: 10 months ago - Stars: 521 - Forks: 22

Daming-W/EcoDatum
The official implementation of [Quality over Quantity: Boosting Data Efficiency Through Ensembled Multimodal Data Curation] in AAAI2025.
Size: 6.84 KB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 1 - Forks: 0

mhw32/multimodal-vae-public
A PyTorch implementation of "Multimodal Generative Models for Scalable Weakly-Supervised Learning" (https://arxiv.org/abs/1802.05335)
Language: Python - Size: 3.9 MB - Last synced at: about 9 hours ago - Pushed at: over 6 years ago - Stars: 158 - Forks: 36

miccunifi/SEARLE
[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion
Language: Python - Size: 20.1 MB - Last synced at: 18 days ago - Pushed at: 12 months ago - Stars: 170 - Forks: 10

TencentARC/ViT-Lens
[CVPR 2024] ViT-Lens: Towards Omni-modal Representations
Language: Python - Size: 132 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 174 - Forks: 10

pliang279/MultiViz
[ICLR 2023] MultiViz: Towards Visualizing and Understanding Multimodal Models
Language: Python - Size: 790 MB - Last synced at: 9 days ago - Pushed at: 8 months ago - Stars: 96 - Forks: 5

sbelharbi/feature-vs-text-compound-emotion
Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild, ABAW 7th - Challenge - Compound Expression (CE) Recognition Challenge
Language: Python - Size: 1.41 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 4 - Forks: 0

ksm26/Open-Source-Models-with-Hugging-Face
"Open Source Models with Hugging Face" course empowers you with the skills to leverage open-source models from the Hugging Face Hub for various tasks in NLP, audio, image, and multimodal domains.
Language: Jupyter Notebook - Size: 21 MB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 24 - Forks: 19

alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language: Python - Size: 17 MB - Last synced at: 25 days ago - Pushed at: 9 months ago - Stars: 138 - Forks: 5

merveenoyan/siglip
Projects based on SigLIP (Zhai et. al, 2023) and Hugging Face transformers integration π€
Language: Jupyter Notebook - Size: 1.66 MB - Last synced at: 17 days ago - Pushed at: about 2 months ago - Stars: 224 - Forks: 12

mims-harvard/Madrigal
Madrigal: Multimodal AI predicts clinical outcomes of drug combinations from preclinical data
Language: Jupyter Notebook - Size: 18.3 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 6

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
Language: Python - Size: 18.6 KB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

Hoar012/RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
Language: Python - Size: 57.7 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 30 - Forks: 0

breezedeus/Coin-CLIP
Coin-CLIP: fine-tuned with a vast collection of coin images from CLIP using contrastive learning. It enhances feature extraction for coins, boosting image search accuracy. This model merges Visual Transformer (ViT) with CLIP's multimodal learning, optimized for numismatic applications.
Language: Python - Size: 50.3 MB - Last synced at: 19 days ago - Pushed at: over 1 year ago - Stars: 20 - Forks: 3

ai4ce/EgoPAT3D
[CVPR 2022] Egocentric Action Target Prediction in 3D
Language: Jupyter Notebook - Size: 93.3 MB - Last synced at: 10 days ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 3

ZaneBrackley/VIZMed
Thesis Project | Vision-Integrated Zero-Shot Medical AI
Language: Python - Size: 85.9 KB - Last synced at: 24 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

Jorffy/DAIE
Code for "Dual-Level Adaptive Incongruity-Enhanced Model for Multimodal Sarcasm Detection".
Language: Python - Size: 183 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 20 - Forks: 0

praveena2j/RJCMA
ABAW6 (CVPR-W) We achieved second place in the valence arousal challenge of ABAW6
Language: Python - Size: 171 KB - Last synced at: 9 days ago - Pushed at: 11 months ago - Stars: 18 - Forks: 3

kyegomez/CM3Leon
An open source implementation of "Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning", an all-new multi modal AI that uses just a decoder to generate both text and images
Language: Python - Size: 754 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 359 - Forks: 18

pykale/pykale
Knowledge-Aware machine LEarning (KALE): accessible machine learning from multiple sources for interdisciplinary research, part of the π₯PyTorch ecosystem. β Star to support our work!
Language: Python - Size: 46.4 MB - Last synced at: 8 days ago - Pushed at: 11 days ago - Stars: 455 - Forks: 64

taco-group/DecAlign
A novel cross-modal decoupling and alignment framework for multimodal representation learning.
Language: JavaScript - Size: 13.8 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

praveena2j/RJCAforSpeakerVerification
[FG 2024] "Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention"
Language: Python - Size: 1 MB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 4 - Forks: 0

pengfei-luo/multimodal-knowledge-graph
A collection of resources on multimodal knowledge graph, including datasets, papers and contests.
Size: 50.8 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 162 - Forks: 17

snap-research/MMVID
[CVPR 2022] Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning
Language: Python - Size: 77.5 MB - Last synced at: 13 days ago - Pushed at: almost 3 years ago - Stars: 192 - Forks: 23

zjunlp/HVPNeT
[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
Language: Python - Size: 1.88 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 108 - Forks: 11

OFA-Sys/OFASys
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Language: Python - Size: 20.3 MB - Last synced at: 14 days ago - Pushed at: over 2 years ago - Stars: 147 - Forks: 13

jyrao/UniSoccer
[CVPR 2025] "Towards Universal Soccer Video Understanding".
Language: Python - Size: 80.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 106 - Forks: 5

kyegomez/AutoRT
Implementation of AutoRT: "AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents"
Language: Python - Size: 2.49 MB - Last synced at: 11 days ago - Pushed at: 5 months ago - Stars: 39 - Forks: 3

HuaizhengZhang/Awsome-Deep-Learning-for-Video-Analysis
Papers, code and datasets about deep learning and multi-modal learning for video analysis
Size: 98.6 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 786 - Forks: 171

praveena2j/JointCrossAttentional-AV-Fusion
ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
Language: Python - Size: 148 KB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 9

dlcjfgmlnasa/SynthSleepNet
[Arxiv] Toward Foundational Model for Sleep Analysis Using a Multimodal Hybrid Self-Supervised Learning Framework
Language: Python - Size: 521 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 7 - Forks: 2

RyanJJP/CHARMS
The code repository for ICML24 paper "Tabular Insights, Visual Impacts: Transferring Expertise from Tables to Images"
Language: Python - Size: 973 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 13 - Forks: 1

Xovee/skapp Fork of YifanZhang-git/SKAPP
AAAI '25. Retrieval-Augmented Multimodal Social Media Popularity Prediction
Language: Python - Size: 84 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 16 - Forks: 0

BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Language: Python - Size: 4.2 MB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 30 - Forks: 3

marcomistretta/marcomistretta
Welcome to my GitHub page!
Size: 6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

rajibrhasan/modality_gap
A repository for visualization of modality gap in VLMs
Language: Python - Size: 14.6 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

pliang279/factorized
[ICLR 2019] Learning Factorized Multimodal Representations
Language: Python - Size: 45.9 KB - Last synced at: 12 days ago - Pushed at: over 4 years ago - Stars: 67 - Forks: 10

rabiulcste/vismin
[NeurIPS24] VisMin: Visual Minimal-Change Understanding
Language: Python - Size: 66.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 12 - Forks: 1

rajibrhasan/LLaVA
This repository contains the implementation of a modified LLaVA architecture designed to address information imbalance between modalities in multimodal learning.
Language: Python - Size: 14.8 MB - Last synced at: 13 days ago - Pushed at: 27 days ago - Stars: 0 - Forks: 0

bryanbocao/vitag
Repository of the paper ViTag in SECON 2022π and demo (Best Demo Awardπ).
Language: Python - Size: 401 KB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 1

pliang279/MultiBench
[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning
Language: HTML - Size: 49.9 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 519 - Forks: 75

peirong26/UNA
[CVPR 2025] Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization
Language: Python - Size: 1.25 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 4 - Forks: 0

OmniTitanAI/OmniTitan-RL-AI
A universal RL engine transcending modality barriers, empowering cross-industry intelligence with superhuman decision efficiency. Created by @sudip_royedu
Language: Python - Size: 0 Bytes - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

kevalmorabia97/CoVA-Web-Object-Detection
A Context-aware Visual Attention-based training pipeline for Object Detection from a Webpage screenshot!
Language: Python - Size: 1.4 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 92 - Forks: 14

IRVLUTD/Proto-CLIP
Code release for Proto-CLIP: Vision-Language Prototypical Network for Few-Shot Learning
Language: Python - Size: 69.1 MB - Last synced at: 15 days ago - Pushed at: 4 months ago - Stars: 39 - Forks: 6

minjoong507/BM-DETR
[WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"
Language: Python - Size: 3.07 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 0
