Topic: "multi-modal-large-language-model"
Kobaayyy/Awesome-CVPR2025-CVPR2024-ECCV2024-AIGC
A Collection of Papers and Codes for CVPR2025/CVPR2024/ECCV2024 AIGC
Size: 351 KB - Last synced at: 8 days ago - Pushed at: about 1 month ago - Stars: 540 - Forks: 14

Leon1207/Video-RAG-master
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
Language: Python - Size: 436 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 116 - Forks: 12

gyxxyg/VTG-LLM
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
Language: Python - Size: 88 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 70 - Forks: 1

songweii/DualToken
Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈
Language: Python - Size: 8.09 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 38 - Forks: 0

Ruiyang-061X/VL-Uncertainty
🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
Language: Python - Size: 7.12 MB - Last synced at: 23 days ago - Pushed at: about 2 months ago - Stars: 31 - Forks: 2

Ruiyang-061X/Awesome-MLLM-Reasoning
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
Size: 7.81 KB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 6 - Forks: 0
