An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: video-large-language-models

xuyang-liu16/VidCom2

Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models

Language: Python - Size: 5.52 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 20 - Forks: 0

minjoong507/Consistency-of-Video-LLM

[CVPR 2025] Official Repository of the paper "On the Consistency of Video Large Language Models in Temporal Comprehension"

Language: Python - Size: 1.04 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 8 - Forks: 0

ttengwang/Awesome_Long_Form_Video_Understanding

Awesome papers & datasets specifically focused on long-term videos.

Size: 44.9 KB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 270 - Forks: 12

MAC-AutoML/QuoTA

This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension"

Language: Python - Size: 8.73 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 68 - Forks: 2

Coobiw/MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language: Jupyter Notebook - Size: 73.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 420 - Forks: 23

Leon1207/Video-RAG-master

This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"

Language: Python - Size: 436 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 116 - Forks: 12

gyxxyg/TRACE

[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling

Language: Python - Size: 45.7 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 60 - Forks: 0

gyxxyg/VTG-LLM

[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding

Language: Python - Size: 88 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 70 - Forks: 1