An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: multimodal-foundation-model

ligengen/EgoM2P

[ICCV 2025] The official implementation for EgoM2P: Egocentric Multimodal Multitask Pretraining.

Size: 1.93 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 8 - Forks: 1

xid32/NAACL_2025_TWM

We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.

Language: Python - Size: 896 KB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 309 - Forks: 30

mahmoodlab/MADELEINE

MADELEINE: multi-stain slide representation learning (ECCV'24)

Language: Python - Size: 22.9 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 52 - Forks: 5

MJ-Bench/MJ-Bench

Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"

Language: Jupyter Notebook - Size: 218 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 43 - Forks: 5

TXH-mercury/VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Language: Jupyter Notebook - Size: 73.2 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 144 - Forks: 5