GitHub topics: video-language-model

Repositories

TencentARC/ST-LLM

[ECCV 2024🔥] Official implementation of the paper "ST-LLM: Large Language Models Are Effective Temporal Learners"

Language: Python - Size: 19 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 145 - Forks: 5

patrick-tssn/VideoHallucer

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

Language: Python - Size: 21.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 0

ekazakos/grove

Code implementation for the paper "Large-scale Pre-training for Grounded Video Caption Generation"

Size: 8.59 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 0

moucheng2017/SOP-LVM-ICL-Ensemble

[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding

Language: Python - Size: 834 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

ALucek/multimodal-llm-breakdown

Outlining and demonstrating how language models are able to understand image, video, and text content.

Language: Jupyter Notebook - Size: 14.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language: Jupyter Notebook - Size: 73.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 420 - Forks: 23

bigai-nlco/NeedleInAVideoHaystack Fork of patrick-tssn/MM-NIAVH

Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at various video lengths to measure accuracy

Language: Python - Size: 29.7 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

Related Keywords

video-language-model 7 multimodal-large-language-models 4 needle-in-a-haystack 1 llms 1 video-large-language-models 1 qwen 1 pretraining 1 pipeline-parallelism 1 model-parallel 1 mllm 1 fine-tuning 1 deepspeed 1 vision-language-model 1 multimodal 1 audio-language-model 1 sops 1 pseudo-labeling 1 in-context-learning 1 in-context-ensemble 1 ensemble 1 vision-language 1 video-language-pretrainng 1 video-grounding 1 video-captioning 1 large-scale-pretraining 1 automatic-annotation 1 video-hallucination 1 hallucination-detection 1 video-understanding 1 large-language-models 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos