GitHub topics: mllms
UCSC-VLAA/MedTrinity-25M
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
Language: Python - Size: 1.32 MB - Last synced at: 3 days ago - Pushed at: about 2 months ago - Stars: 370 - Forks: 28

wanghao9610/X-SAM
X-SAM: From Segment Anything to Any Segmentation
Language: Python - Size: 98.9 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 204 - Forks: 6

TUM-AVS/FM-AD-Survey
This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.
Size: 17.4 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 76 - Forks: 7

maokangkun/SigmaFlow
SigmaFlow is a Python package designed to optimize the performance of task-flow related to LLMs/MLLMs or Multi-agent.
Language: JavaScript - Size: 8.18 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 2 - Forks: 0

Gnonymous/Web-CogReasoner
Web-CogReasoner: Towards Knowledge-Induced Cognitive Reasoning for Web Agents
Size: 6.59 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 5 - Forks: 0

HVision-NKU/GlimpsePrune
Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"
Language: Python - Size: 49.3 MB - Last synced at: 26 days ago - Pushed at: about 1 month ago - Stars: 47 - Forks: 1

aim-uofa/SegAgent
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Language: Python - Size: 53.4 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 62 - Forks: 1

GML-FMGroup/Awesome-MLLM-Reasoning
Size: 59.6 KB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 6 - Forks: 0

OS-Agent-Survey/OS-Agent-Survey
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use" (ACL 2025 Oral).
Size: 11.6 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 319 - Forks: 14

swordlidev/Efficient-Multimodal-LLMs-Survey
Efficient Multimodal Large Language Models: A Survey
Size: 1.35 MB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 362 - Forks: 21

XduSyL/EventGPT
🔥[CVPR2025] EventGPT: Event Stream Understanding with Multimodal Large Language Models
Language: Python - Size: 19.4 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 68 - Forks: 5

xuyang-liu16/GlobalCom2
🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
Language: Python - Size: 6.24 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 30 - Forks: 1

JarvisUSTC/DiffPure-RobustVLM
ICCV 2025 official implementation for Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
Language: Jupyter Notebook - Size: 100 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

ScalingOpt/SGG
[ACL 2025 Main] Taming LLMs by Scaling Learning Rates with Gradient Grouping
Language: JavaScript - Size: 2.63 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

path2generalist/General-Level
On Path to Multimodal Generalist: General-Level and General-Bench
Language: Python - Size: 918 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 2

aim-uofa/Omni-R1
Official Repo of Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Language: Python - Size: 165 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 63 - Forks: 3

MicDZ/MANBench
MANBench: Is Your Multimodal Model Smarter than Human?
Language: Python - Size: 22.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

DingchenYang99/RedundancyCodebook
this is the official code repo of our work: Beyond Intermediate States: Explaining Visual Redundancy through Language.https://arxiv.org/abs/2503.20540
Language: Jupyter Notebook - Size: 14.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 2

Fanziyang-v/MLLMs-Accelerator
State-of-the-art training-free MLLMs acceleration methods implementation
Language: Python - Size: 11.3 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

aim-uofa/Active-o3
ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO
Size: 4.86 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 59 - Forks: 1

GeWu-Lab/Crab
[CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Language: Python - Size: 30.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 25 - Forks: 0

XuankunRong/BYE
Backdoor Cleaning without External Guidance in MLLM Fine-tuning
Language: Python - Size: 8.77 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0

yuanze-lin/Olympus
[CVPR 2025 Highlight] Official code for "Olympus: A Universal Task Router for Computer Vision Tasks"
Language: Python - Size: 3.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 425 - Forks: 71

rookie-littleblack/XpertEval
XpertEval: All-in-One Evaluation Framework for Multimodal Large Models
Language: Python - Size: 237 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

sun-hailong/TVC
🎉 [ACL 2025] The code repository for "Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning" in PyTorch.
Language: Python - Size: 9.69 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 0

GML-FMGroup/awesome_autonomous_agents
Size: 62.5 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

VILA-Lab/M-Attack
A Simple Baseline Achieving Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1. Paper at: https://arxiv.org/abs/2503.10635
Language: Python - Size: 32.4 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 56 - Forks: 1

MraDonkey/DMAD
[ICLR 2025] Breaking Mental Set to Improve Reasoning through Diverse Multi-Agent Debate
Language: Python - Size: 1.81 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

JaaackHongggg/WorldSense
WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs
Language: JavaScript - Size: 22.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 20 - Forks: 1

PanguIR/MRAGSurvey
A Survey of Multimodal Retrieval-Augmented Generation
Size: 4.92 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 1

924973292/IDEA
【CVPR2025】IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Language: Python - Size: 34.9 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 17 - Forks: 3

HJYao00/Awesome-Reasoning-MLLM
Awesome Reasoning in MLLMs: Papers and Projects about learning to reason with MLLMs, including Chain-of-Thought (CoT), OpenAl o1, and DeepSeek-R1
Size: 16.6 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 1

HashmatShadab/Robust-LLaVA
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
Language: Python - Size: 69.6 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 0

ErFer7/Computer-Vision
Este repositório contém o código do trabalho desenvolvido para a disciplina de Visão Computacional (INE410121) da UFSC.
Language: Jupyter Notebook - Size: 4.7 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

simoncwang/MMO
Multimodal Multi-agent Organization and Benchmarking
Language: Python - Size: 74.2 KB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
