Topic: "large-vision-language-models"
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Size: 82.8 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 14,772 - Forks: 943

ShareGPT4Omni/ShareGPT4Video
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Language: Python - Size: 7.73 MB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 1,053 - Forks: 41

NVlabs/DoRA
[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation
Language: Python - Size: 3.06 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 668 - Forks: 44

MME-Benchmarks/Video-MME
âšâš[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Size: 16.7 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 524 - Forks: 20

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
ð¥ð¥ð¥ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language: HTML - Size: 12.7 MB - Last synced at: 10 days ago - Pushed at: 20 days ago - Stars: 455 - Forks: 26

Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Size: 369 KB - Last synced at: about 14 hours ago - Pushed at: 4 months ago - Stars: 425 - Forks: 48

burglarhobbit/Awesome-Medical-Large-Language-Models
Curated papers on Large Language Models in Healthcare and Medical domain
Size: 154 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 273 - Forks: 32

tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Language: Python - Size: 11.1 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 270 - Forks: 8

ShareGPT4Omni/ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Language: Python - Size: 644 KB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 211 - Forks: 5

khuangaf/Awesome-Chart-Understanding
A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.
Size: 130 KB - Last synced at: 9 days ago - Pushed at: 23 days ago - Stars: 196 - Forks: 19

llmbev/talk2bev
Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)
Language: Python - Size: 142 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 109 - Forks: 10

MMStar-Benchmark/MMStar
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 1

yu-rp/apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
Language: Python - Size: 8.63 MB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 79 - Forks: 6

yfzhang114/LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.
Language: Python - Size: 64.9 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 77 - Forks: 2

mbzuai-oryx/GeoPixel
GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.
Language: Python - Size: 29.7 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 72 - Forks: 2

ys-zong/VLGuard
[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.
Language: Python - Size: 1.97 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 51 - Forks: 2

FudanDISC/ReForm-Eval
An benchmark for evaluating the capabilities of large vision-language models (LVLMs)
Language: Python - Size: 10 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 46 - Forks: 4

Ruiyang-061X/Awesome-MLLM-Uncertainty
âšA curated list of papers on the uncertainty in multi-modal large language model (MLLM).
Size: 381 KB - Last synced at: 2 days ago - Pushed at: 23 days ago - Stars: 44 - Forks: 0

sakura2233565548/TabPedia
This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Language: Python - Size: 2.3 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 27 - Forks: 1

SuperBruceJia/Awesome-Large-Vision-Language-Model
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
Size: 103 KB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 27 - Forks: 3

SuperBruceJia/Awesome-Mixture-of-Experts
Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)
Size: 438 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 3

sled-group/moh
Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)
Language: Python - Size: 647 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 24 - Forks: 1

MSIIP/MedM-VL
MedM-VL is a modular, LLaVA-based codebase for medical LVLMs.
Language: Python - Size: 362 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 21 - Forks: 3

khuangaf/CHOCOLATE
Code and data for the paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
Language: Jupyter Notebook - Size: 3.48 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 0

The-Martyr/CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Language: Python - Size: 2.99 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 17 - Forks: 1

The-Martyr/Awesome-Modality-Priors-in-MLLMs
Latest Advances on Modality Priors in Multimodal Large Language Models
Size: 70.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 1

bowen-upenn/Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
Language: Python - Size: 10.6 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 11 - Forks: 0

hiyamdebary/EarthDial
[CVPR 2025 ð¥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.
Language: Python - Size: 8.44 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 1

ShareGPT4Omni/ShareGPT4Omni
ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations
Size: 0 Bytes - Last synced at: 13 days ago - Pushed at: 11 months ago - Stars: 8 - Forks: 0

CristianoPatricio/CBVLM
Code for the paper "CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification".
Language: Python - Size: 903 KB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 7 - Forks: 1

NKU-MetautoAI/awesome-large-vision-language-models
Advances in recent large vision language models (LVLMs)
Size: 32.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

gaotiexinqu/V2P-Bench
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
Size: 16.9 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 0

TIBHannover/patent-figure-classification
Official code for ECIR 2025 paper Patent Figure Classification using Large Vision-language Models
Language: Python - Size: 216 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

afondiel/Prompt-Engineering-for-Vision-Models-DeepLearningAI
These notes and resources are compiled from the crash course Prompt Engineering for Vision Models offered by DeepLearning.AI.
Language: Jupyter Notebook - Size: 103 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

praj2408/End-to-end-LLM-and-image-model-application-using-Gemini-Pro
Gemini Pro, your do-it-all AI tool, translates languages, sparks creativity, and answers questions, all while efficiently running on devices from phones to data centers, making it accessible for developers and businesses to unlock AI's potential.
Language: Python - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
