GitHub topics: multimodal-foundation-model
ligengen/EgoM2P
[ICCV 2025] The official implementation for EgoM2P: Egocentric Multimodal Multitask Pretraining.
Size: 1.93 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 8 - Forks: 1

xid32/NAACL_2025_TWM
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
Language: Python - Size: 896 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 309 - Forks: 30

mahmoodlab/MADELEINE
MADELEINE: multi-stain slide representation learning (ECCV'24)
Language: Python - Size: 22.9 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 52 - Forks: 5

MJ-Bench/MJ-Bench
Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"
Language: Jupyter Notebook - Size: 218 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 43 - Forks: 5

TXH-mercury/VAST
Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Language: Jupyter Notebook - Size: 73.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 5
