Topic: "multimodal-large-language-models"
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Size: 82.9 MB - Last synced at: 4 days ago - Pushed at: 10 days ago - Stars: 15,444 - Forks: 1,002

X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Language: Python - Size: 383 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 4,281 - Forks: 430

joanrod/star-vector
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.
Language: Python - Size: 6.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3,570 - Forks: 186

modelscope/modelscope-agent
ModelScope-Agent: An agent framework connecting models in ModelScope with the world
Language: Python - Size: 68.1 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 3,157 - Forks: 358

ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language: Python - Size: 3.28 MB - Last synced at: 18 days ago - Pushed at: 21 days ago - Stars: 2,923 - Forks: 197

VITA-MLLM/VITA
✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
Language: Python - Size: 15.7 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 2,184 - Forks: 164

X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language: Python - Size: 105 MB - Last synced at: 9 days ago - Pushed at: 10 days ago - Stars: 2,177 - Forks: 126

cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language: Python - Size: 1.99 MB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 1,905 - Forks: 132

YangLing0818/RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Language: Jupyter Notebook - Size: 64.2 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 1,802 - Forks: 101

BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language: Python - Size: 28.5 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 1,015 - Forks: 74

ByteDance-Seed/Seed1.5-VL
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Language: Jupyter Notebook - Size: 140 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 949 - Forks: 26

AIDC-AI/Ovis
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Language: Python - Size: 5.56 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 925 - Forks: 57

X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
Language: Python - Size: 169 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 811 - Forks: 79

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 309 KB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 753 - Forks: 68

LLaVA-VL/LLaVA-Plus-Codebase
LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills
Language: Python - Size: 19 MB - Last synced at: 30 days ago - Pushed at: over 1 year ago - Stars: 740 - Forks: 58

deepglint/unicom
Large-Scale Visual Representation Model
Language: Python - Size: 22.9 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 668 - Forks: 31

VITA-MLLM/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
Language: Python - Size: 21.2 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 634 - Forks: 30

rese1f/MovieChat
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Language: Python - Size: 78.7 MB - Last synced at: 6 days ago - Pushed at: 4 months ago - Stars: 621 - Forks: 41

yaotingwangofficial/Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Size: 4.63 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 576 - Forks: 15

MME-Benchmarks/Video-MME
✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Size: 16.7 MB - Last synced at: 23 days ago - Pushed at: about 1 month ago - Stars: 551 - Forks: 20

SkyworkAI/Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Language: Python - Size: 667 MB - Last synced at: 8 days ago - Pushed at: 8 months ago - Stars: 541 - Forks: 34

NVIDIA/audio-flamingo
PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.
Language: Python - Size: 4.98 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 484 - Forks: 27

ictnlp/LLaVA-Mini
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Language: Python - Size: 54.6 MB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 479 - Forks: 22

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language: HTML - Size: 12.7 MB - Last synced at: 29 days ago - Pushed at: 2 months ago - Stars: 472 - Forks: 26

Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Size: 369 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 423 - Forks: 48

Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Language: Jupyter Notebook - Size: 73.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 420 - Forks: 23

hustvl/EVF-SAM
Official code of "EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model"
Language: Python - Size: 5.94 MB - Last synced at: 17 days ago - Pushed at: 3 months ago - Stars: 406 - Forks: 19

jingyi0000/R1-VL
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Language: Python - Size: 2.36 MB - Last synced at: 7 days ago - Pushed at: about 1 month ago - Stars: 392 - Forks: 0

HenryHZY/Awesome-Multimodal-LLM
Research Trends in LLM-guided Multimodal Learning.
Size: 17.6 KB - Last synced at: 24 days ago - Pushed at: over 1 year ago - Stars: 358 - Forks: 16

FoundationVision/Liquid
Liquid: Language Models are Scalable and Unified Multi-modal Generators
Language: Python - Size: 31.6 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 353 - Forks: 24

baaivision/EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
Language: Python - Size: 6.95 MB - Last synced at: 16 days ago - Pushed at: 3 months ago - Stars: 326 - Forks: 8

burglarhobbit/Awesome-Medical-Large-Language-Models
Curated papers on Large Language Models in Healthcare and Medical domain
Size: 53.7 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 316 - Forks: 37

tsujuifu/pytorch_mgie
A Gradio demo of MGIE
Language: Python - Size: 32.5 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 307 - Forks: 24

X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Language: Python - Size: 15.1 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 297 - Forks: 11

zjysteven/lmms-finetune
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
Language: Python - Size: 13 MB - Last synced at: 22 days ago - Pushed at: 3 months ago - Stars: 296 - Forks: 33

IrohXu/Awesome-Multimodal-LLM-Autonomous-Driving
[WACV 2024 Survey Paper] Multimodal Large Language Models for Autonomous Driving
Size: 15 MB - Last synced at: about 20 hours ago - Pushed at: about 1 year ago - Stars: 289 - Forks: 11

AIDC-AI/Awesome-Unified-Multimodal-Models
Awesome Unified Multimodal Models
Size: 6.97 MB - Last synced at: 4 days ago - Pushed at: 17 days ago - Stars: 279 - Forks: 6

VITA-MLLM/Freeze-Omni
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Language: Python - Size: 10.1 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 257 - Forks: 16

mbzuai-oryx/LLMVoX
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM
Language: Python - Size: 132 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 246 - Forks: 27

apple/ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Language: Python - Size: 375 KB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 217 - Forks: 13

THUDM/VisualAgentBench
Towards Large Multimodal Models as Visual Foundation Agents
Language: Python - Size: 5.56 MB - Last synced at: 15 days ago - Pushed at: about 2 months ago - Stars: 214 - Forks: 6

JUNJIE99/MLVU
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
Language: Python - Size: 2.51 MB - Last synced at: 8 days ago - Pushed at: 9 days ago - Stars: 201 - Forks: 1

friedrichor/Awesome-Multimodal-Papers
A curated list of awesome Multimodal studies.
Language: HTML - Size: 63.3 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 192 - Forks: 19

cyw-3d/SAR3D
Official repository for "SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE"
Language: Python - Size: 11.8 MB - Last synced at: about 4 hours ago - Pushed at: about 5 hours ago - Stars: 154 - Forks: 1

shaopengw/Awesome-Music-Generation
Awesome music generation model——MG²
Language: Python - Size: 3.16 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 154 - Forks: 10

baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Language: Python - Size: 18.1 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 145 - Forks: 1

scofield7419/Video-of-Thought
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
Language: Python - Size: 1.72 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 141 - Forks: 7

dvlab-research/VisionReasoner
The official implement of "VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning"
Language: Python - Size: 12.1 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 130 - Forks: 8

pipixin321/HolmesVAD
Official implementation of "Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM"
Language: Python - Size: 19.3 MB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 129 - Forks: 5

NishilBalar/Awesome-LVLM-Hallucination
up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources
Size: 189 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 125 - Forks: 6

zjukg/KoPA
[Paper][ACM MM 2024] Making Large Language Models Perform Better in Knowledge Graph Completion
Language: Python - Size: 2.85 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 123 - Forks: 8

shufangxun/LLaVA-MoD
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Language: Python - Size: 3.41 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 120 - Forks: 7

danilop/multimodal-chat
A multimodal chat interface with many tools.
Language: Python - Size: 254 KB - Last synced at: 14 days ago - Pushed at: 3 months ago - Stars: 119 - Forks: 18

OpenGVLab/MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.
Language: Python - Size: 2.83 MB - Last synced at: 2 days ago - Pushed at: 7 months ago - Stars: 117 - Forks: 6

rese1f/aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Language: Python - Size: 25.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 108 - Forks: 5

lll6gg/UI-R1
Code for "UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning"
Language: Python - Size: 1.04 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 104 - Forks: 6

invictus717/MiCo
Explore the Limits of Omni-modal Pretraining at Scale
Language: Python - Size: 11.6 MB - Last synced at: 2 months ago - Pushed at: 9 months ago - Stars: 97 - Forks: 4

X-PLUG/mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Language: Python - Size: 13.9 MB - Last synced at: 9 days ago - Pushed at: over 1 year ago - Stars: 95 - Forks: 2

OpenGVLab/PIIP
[NeurIPS 2024 Spotlight ⭐️] Parameter-Inverted Image Pyramid Networks (PIIP)
Language: Python - Size: 11.7 MB - Last synced at: 2 days ago - Pushed at: 24 days ago - Stars: 91 - Forks: 2

mu-cai/matryoshka-mm
Matryoshka Multimodal Models
Language: Python - Size: 26.1 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 90 - Forks: 5

LINs-lab/DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Language: Python - Size: 57.3 MB - Last synced at: 28 days ago - Pushed at: 4 months ago - Stars: 89 - Forks: 11

showlab/LOVA3
(NeurIPS 2024) Official PyTorch implementation of LOVA3
Language: Python - Size: 6.01 MB - Last synced at: 19 days ago - Pushed at: 3 months ago - Stars: 85 - Forks: 2

Sreyan88/GAMA
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Language: Python - Size: 15.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 84 - Forks: 9

zjunlp/Deco
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Language: Python - Size: 17.6 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 82 - Forks: 7

Haochen-Wang409/ross
[ICLR'25] Reconstructive Visual Instruction Tuning
Language: Python - Size: 12.8 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 75 - Forks: 3

xjywhu/Awesome-Multimodal-LLM-for-Code
Multimodal Large Language Models for Code Generation under Multimodal Scenarios
Size: 234 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 72 - Forks: 2

ritzz-ai/GUI-R1
Official implementation of GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
Language: Python - Size: 974 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 70 - Forks: 5

vincentlux/Awesome-Multimodal-LLM
Reading list for Multimodal Large Language Models
Size: 110 KB - Last synced at: 26 days ago - Pushed at: almost 2 years ago - Stars: 68 - Forks: 7

keshik6/HourVideo
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
Language: Jupyter Notebook - Size: 8.16 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 67 - Forks: 3

AviSoori1x/seemore
From scratch implementation of a vision language model in pure PyTorch
Language: Jupyter Notebook - Size: 20 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 67 - Forks: 4

ai4colonoscopy/IntelliScope
Frontiers in Intelligent Colonoscopy [ColonSurvey | ColonINST | ColonGPT]
Language: Python - Size: 32.3 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 66 - Forks: 4

gyxxyg/TRACE
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
Language: Python - Size: 45.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 60 - Forks: 0

ChocoWu/SeTok
Codes for Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
Language: Python - Size: 2.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 54 - Forks: 0

JinXins/Awesome-Token-Merge-for-MLLMs
A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.
Size: 103 KB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 54 - Forks: 0

zjunlp/OceanGPT
[沧渊] [ACL 2024] OceanGPT: A Large Language Model for Ocean Science Tasks
Language: Python - Size: 38.4 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 53 - Forks: 7

Hoar012/RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
Language: Python - Size: 60.9 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 53 - Forks: 1

RainBowLuoCS/OpenOmni
OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Real-Time Self-Aware Emotional Speech Synthesis
Language: Python - Size: 8.45 MB - Last synced at: 10 days ago - Pushed at: 11 days ago - Stars: 51 - Forks: 5

Victorwz/MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
Language: Python - Size: 30.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 51 - Forks: 1

MSR3D/MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
Language: Python - Size: 75.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 50 - Forks: 2

IDEA-FinAI/ChartMoE
[ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
Language: Jupyter Notebook - Size: 9.76 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 50 - Forks: 1

richard-peng-xia/RULE
[EMNLP'24] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
Language: Python - Size: 7.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 50 - Forks: 3

Hangover3832/ComfyUI-Hangover-Moondream 📦
Moondream is a lightweight multimodal large language model
Language: Python - Size: 2.14 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 46 - Forks: 7

weihaox/UMBRAE
[ECCV 2024] UMBRAE: Unified Multimodal Brain Decoding | Unveiling the 'Dark Side' of Brain Modality
Language: Jupyter Notebook - Size: 34.6 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 46 - Forks: 3

OpenKG-ORG/EasyDetect
An Easy-to-use Hallucination Detection Framework for LLMs.
Language: Python - Size: 12 MB - Last synced at: 10 months ago - Pushed at: about 1 year ago - Stars: 45 - Forks: 3

Wang-ML-Lab/multimodal-needle-in-a-haystack
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
Language: Python - Size: 16.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 42 - Forks: 3

taco-group/Re-Align
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Language: Python - Size: 18.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 40 - Forks: 1

scofield7419/EmpathyEar
Multimodal Empathetic Chatbot
Language: Python - Size: 423 MB - Last synced at: 5 days ago - Pushed at: 11 months ago - Stars: 40 - Forks: 6

pipixin321/HolmesVAU
[CVPR 2025] Official implementation of "Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity"
Language: Python - Size: 60.1 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 39 - Forks: 2

mbzuai-oryx/ALM-Bench
[CVPR 2025 🔥] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses the next generation of LMMs on cultural inclusitivity.
Language: Python - Size: 26.7 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 38 - Forks: 2

UKPLab/5pils
Code associated with the EMNLP 2024 Main paper: "Image, tell me your story!" Predicting the original meta-context of visual misinformation.
Language: Python - Size: 3.38 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 38 - Forks: 4

RaptorMai/MLLM-CompBench
[NeurIPS'25] MLLM-CompBench evaluates the comparative reasoning of MLLMs with 40K image pairs and questions across 8 dimensions of relative comparison: visual attribute, existence, state, emotion, temporality, spatiality, quantity, and quality. CompBench covers diverse visual domains, including animals, fashion, sports, and scenes
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 38 - Forks: 2

cocacola-lab/MineLand
Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs
Language: Python - Size: 83.1 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 37 - Forks: 4

AIDC-AI/Parrot
🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.
Language: Python - Size: 25.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 36 - Forks: 1

VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language: Python - Size: 3.17 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 36 - Forks: 1

piomin/spring-ai-showcase
Sample Spring AI Application with several use cases
Language: Java - Size: 3.95 MB - Last synced at: 9 days ago - Pushed at: 14 days ago - Stars: 35 - Forks: 18

EternityYW/Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
Size: 16.9 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 35 - Forks: 2

whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
Language: Python - Size: 3.22 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 34 - Forks: 0

Lzcstan/DrugLAMP
A PyTorch-based system for highly accurate drug-target interaction predictions utilizing multi-modal large language models to discern structural affinities in drug-target pairs.
Language: Python - Size: 128 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 0

GLUS-video/GLUS
[CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Language: Jupyter Notebook - Size: 66.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 31 - Forks: 2

zjunlp/EasyDetect
[ACL 2024] An Easy-to-use Hallucination Detection Framework for LLMs.
Language: Python - Size: 11.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 30 - Forks: 1
