GitHub topics: mllm-evaluation

Repositories

path2generalist/General-Level

On Path to Multimodal Generalist: General-Level and General-Bench

Language: Python - Size: 918 KB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 17 - Forks: 2

Lum1104/EIBench

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

Language: Python - Size: 11.2 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 18 - Forks: 0

williamium3000/core-knowledge

Office codebase for ICML 2025 paper "Core Knowledge Deficits in Multi-Modal Language Models"

Size: 199 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

zhousheng97/EgoTextVQA

[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Language: Python - Size: 9.64 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 33 - Forks: 1

luo-junyu/FinMME

[ACL 2025] FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation

Language: Python - Size: 1.21 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

EchoDreamer/Modality-Preference

Modality Preference

Language: Python - Size: 23 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

We introduce the YesBut-v2, a benchmark for assessing AI's ability to interpret juxtaposed comic panels with contradictory narratives. Unlike existing benchmarks, it emphasizes visual understanding, comparative reasoning, and social knowledge.

Language: JavaScript - Size: 22.3 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 1 - Forks: 0

AdaCheng/VidEgoThink

The official code and data for paper "VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI"

Language: Python - Size: 129 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 8 - Forks: 0

Now-Join-Us/OmniEvalKit Fork of AIDC-AI/M3Bench

The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"

Language: Python - Size: 3.82 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 2

simoncwang/MMO

Multimodal Multi-agent Organization and Benchmarking

Language: Python - Size: 74.2 KB - Last synced at: 5 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

Related Keywords

mllm-evaluation 10 benchmark 2 llm-evaluation 2 mllm 2 mllms 2 mllm-reasoning 2 llms 1 benchmarking 1 large-language-models 1 evaluation-framework 1 egocentric-videos 1 yesbut-v2 1 yesbut 1 vlm 1 videoqa 1 scene-text-vqa 1 scene-text-videoqa 1 egocentric-qa-assistance 1 multi-modal-large-language-model 1 large-language-model 1 core-knowledge 1 emotion-reasoning 1 emotion-analysis 1 chain-of-thought-reasoning 1 multimodal-large-language-models 1 multimodal-generalist 1 llm 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos