Topic: "video-language-model"
Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Language: Jupyter Notebook - Size: 73.1 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 420 - Forks: 23

TencentARC/ST-LLM
[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"
Language: Python - Size: 19 MB - Last synced at: 4 months ago - Pushed at: 12 months ago - Stars: 145 - Forks: 5

patrick-tssn/VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
Language: Python - Size: 21.1 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 27 - Forks: 0

moucheng2017/SOP-LVM-ICL-Ensemble
[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding
Language: Python - Size: 834 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 22 - Forks: 3

ekazakos/grove
Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation" (ICCV 2025)
Language: Python - Size: 8.73 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 19 - Forks: 0

bigai-nlco/NeedleInAVideoHaystack Fork of patrick-tssn/MM-NIAVH
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at various video lengths to measure accuracy
Language: Python - Size: 29.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 7 - Forks: 0

ALucek/multimodal-llm-breakdown
Outlining and demonstrating how language models are able to understand image, video, and text content.
Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0
