An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: large-vision-language-model

BradyFU/Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Size: 82.9 MB - Last synced at: 8 days ago - Pushed at: 14 days ago - Stars: 15,516 - Forks: 1,006

Ruiyang-061X/VL-Uncertainty

🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".

Language: Python - Size: 7.12 MB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 36 - Forks: 3

PKU-YuanGroup/MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language: Python - Size: 16.5 MB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 2,170 - Forks: 135

InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Language: Python - Size: 200 MB - Last synced at: 29 days ago - Pushed at: about 1 month ago - Stars: 2,834 - Forks: 172

PKU-YuanGroup/Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language: Python - Size: 113 MB - Last synced at: 30 days ago - Pushed at: 7 months ago - Stars: 3,245 - Forks: 234

yaotingwangofficial/Awesome-MCoT

Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey

Size: 4.63 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 576 - Forks: 15

yu-rp/apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

Language: Python - Size: 8.63 MB - Last synced at: 14 days ago - Pushed at: 8 months ago - Stars: 88 - Forks: 5

SuperBruceJia/Awesome-Large-Vision-Language-Model

Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model

Size: 103 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 27 - Forks: 3

lucaswychan/quant-lvlm

Easy-to-use large vision language model pipeline for quantitative analysis

Language: Python - Size: 953 KB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

jqtangust/hawk

🔥 🔥 🔥 [NeurIPS 2024] Hawk: Learning to Understand Open-World Video Anomalies

Language: Python - Size: 5.35 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 193 - Forks: 2

Orlando-CS/Awesome-VLA

✨✨latest advancements in VLA models(VIsion Language Action)

Size: 0 Bytes - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

ADL-X/LLAVIDAL

This is the offical repository of LLAVIDAL

Language: Python - Size: 32 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 13 - Forks: 1

ai4ce/LUWA

[CVPR 2024 Highlight] The first benchmark for lithic use-wear analysis leveraging SOTA vision and vision-language models (DINOv2, GPT-4V), demonstrating AI performance surpassing that of expert archaeologists.

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

richard-peng-xia/CARES

[NeurIPS'24 & ICMLW'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Language: Python - Size: 3.76 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 59 - Forks: 4

MMStar-Benchmark/MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 1

Related Keywords
large-vision-language-model 15 instruction-tuning 4 large-language-models 4 large-vision-language-models 4 vision-language-model 4 multimodal 3 multi-modality 3 multimodal-large-language-models 3 multi-modal 3 large-multimodal-models 2 computer-vision 2 llm 2 multimodal-learning 2 chain-of-thought 2 multimodal-chain-of-thought 2 pytorch 1 lvlms 1 vision-and-language 1 multimodality 1 natural-language-processing 1 machine-learning 1 general-artificial-intelligence 1 foundation-models 1 deep-learning 1 visual-question-answering 1 artificial-intelligence 1 artificial-general-intelligence 1 visual-prompting 1 vision-language-models 1 prompting 1 evaluation 1 trustworthy-ai 1 medical-multimodal-learning 1 llms 1 archeology 1 anthropology 1 ai4science 1 llvm 1 activities-of-daily-living 1 action-recognition 1 lvlm 1 video-understanding 1 video-anomaly-detection 1 video 1 anomaly-detection 1 anomaly 1 quantitative-finance 1 system-2 1 chatgpt 1 moe 1 mixture-of-experts 1 vision-language 1 uncertainty-quantification 1 uncertainty-estimation 1 uncertainty-analysis 1 uncertainty 1 multi-modal-large-language-model 1 hallucination-evaluation 1 hallucination-detection 1 hallucination 1 visual-instruction-tuning 1 multimodal-instruction-tuning 1 multimodal-in-context-learning 1 instruction-following 1 in-context-learning 1 survey 1 slow-thinking 1 reasoning 1 openai-o1 1 mllm-reasoning 1 mcts 1 deepseek-r1 1 cot 1 visual-language-learning 1 vision-transformer 1 supervised-finetuning 1 mllm 1 large-language-model 1 language-model 1 gpt-4 1 gpt 1 foundation 1