large-vision-language-models | Topic

Topic: "large-vision-language-models"

BradyFU/Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Size: 82.8 MB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 14,772 - Forks: 943

ShareGPT4Omni/ShareGPT4Video

[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Language: Python - Size: 7.73 MB - Last synced at: 11 days ago - Pushed at: 7 months ago - Stars: 1,053 - Forks: 41

NVlabs/DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Language: Python - Size: 3.06 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 668 - Forks: 44

MME-Benchmarks/Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Size: 16.7 MB - Last synced at: 7 days ago - Pushed at: 8 days ago - Stars: 524 - Forks: 20

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language: HTML - Size: 12.7 MB - Last synced at: 10 days ago - Pushed at: 20 days ago - Stars: 455 - Forks: 26

Paranioar/Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

Size: 369 KB - Last synced at: about 14 hours ago - Pushed at: 4 months ago - Stars: 425 - Forks: 48

burglarhobbit/Awesome-Medical-Large-Language-Models

Curated papers on Large Language Models in Healthcare and Medical domain

Size: 154 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 273 - Forks: 32

tianyi-lab/HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Language: Python - Size: 11.1 MB - Last synced at: 3 months ago - Pushed at: 5 months ago - Stars: 270 - Forks: 8

ShareGPT4Omni/ShareGPT4V

[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions

Language: Python - Size: 644 KB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 211 - Forks: 5

khuangaf/Awesome-Chart-Understanding

A curated list of recent and past chart understanding work based on our IEEE TKDE survey paper: From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models.

Size: 130 KB - Last synced at: 9 days ago - Pushed at: 23 days ago - Stars: 196 - Forks: 19

llmbev/talk2bev

Talk2BEV: Language-Enhanced Bird's Eye View Maps (ICRA'24)

Language: Python - Size: 142 MB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 109 - Forks: 10

MMStar-Benchmark/MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 1

yu-rp/apiprompting

[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models

Language: Python - Size: 8.63 MB - Last synced at: 22 days ago - Pushed at: 7 months ago - Stars: 79 - Forks: 6

yfzhang114/LLaVA-Align

This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strategy.

Language: Python - Size: 64.9 MB - Last synced at: 15 days ago - Pushed at: 2 months ago - Stars: 77 - Forks: 2

mbzuai-oryx/GeoPixel

GeoPixel: A Pixel Grounding Large Multimodal Model for Remote Sensing is specifically developed for high-resolution remote sensing image analysis, offering advanced multi-target pixel grounding capabilities.

Language: Python - Size: 29.7 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 72 - Forks: 2

ys-zong/VLGuard

[ICML 2024] Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models.

Language: Python - Size: 1.97 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 51 - Forks: 2

FudanDISC/ReForm-Eval

An benchmark for evaluating the capabilities of large vision-language models (LVLMs)

Language: Python - Size: 10 MB - Last synced at: 2 days ago - Pushed at: over 1 year ago - Stars: 46 - Forks: 4

Ruiyang-061X/Awesome-MLLM-Uncertainty

✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).

Size: 381 KB - Last synced at: 2 days ago - Pushed at: 23 days ago - Stars: 44 - Forks: 0

sakura2233565548/TabPedia

This repository is the codebase of TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy

Language: Python - Size: 2.3 MB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 27 - Forks: 1

SuperBruceJia/Awesome-Large-Vision-Language-Model

Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model

Size: 103 KB - Last synced at: 5 days ago - Pushed at: 7 months ago - Stars: 27 - Forks: 3

SuperBruceJia/Awesome-Mixture-of-Experts

Awesome Mixture of Experts (MoE): A Curated List of Mixture of Experts (MoE) and Mixture of Multimodal Experts (MoME)

Size: 438 KB - Last synced at: 5 days ago - Pushed at: 3 months ago - Stars: 24 - Forks: 3

sled-group/moh

Official Repository of Multi-Object Hallucination in Vision-Language Models (NeurIPS 2024)

Language: Python - Size: 647 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 24 - Forks: 1

MSIIP/MedM-VL

MedM-VL is a modular, LLaVA-based codebase for medical LVLMs.

Language: Python - Size: 362 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 21 - Forks: 3

khuangaf/CHOCOLATE

Code and data for the paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"

Language: Jupyter Notebook - Size: 3.48 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 20 - Forks: 0

The-Martyr/CausalMM

[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Language: Python - Size: 2.99 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 17 - Forks: 1

The-Martyr/Awesome-Modality-Priors-in-MLLMs

Latest Advances on Modality Priors in Multimodal Large Language Models

Size: 70.3 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 13 - Forks: 1

bowen-upenn/Multi-Agent-VQA

[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

Language: Python - Size: 10.6 MB - Last synced at: 6 days ago - Pushed at: 7 months ago - Stars: 11 - Forks: 0

hiyamdebary/EarthDial

[CVPR 2025 🔥] EarthDial: Turning Multi-Sensory Earth Observations to Interactive Dialogues.

Language: Python - Size: 8.44 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 8 - Forks: 1

ShareGPT4Omni/ShareGPT4Omni

ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations

Size: 0 Bytes - Last synced at: 13 days ago - Pushed at: 11 months ago - Stars: 8 - Forks: 0

CristianoPatricio/CBVLM

Code for the paper "CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification".

Language: Python - Size: 903 KB - Last synced at: about 16 hours ago - Pushed at: about 17 hours ago - Stars: 7 - Forks: 1

NKU-MetautoAI/awesome-large-vision-language-models

Advances in recent large vision language models (LVLMs)

Size: 32.7 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 7 - Forks: 0

gaotiexinqu/V2P-Bench

V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction

Size: 16.9 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 5 - Forks: 0

TIBHannover/patent-figure-classification

Official code for ECIR 2025 paper Patent Figure Classification using Large Vision-language Models

Language: Python - Size: 216 KB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

afondiel/Prompt-Engineering-for-Vision-Models-DeepLearningAI

These notes and resources are compiled from the crash course Prompt Engineering for Vision Models offered by DeepLearning.AI.

Language: Jupyter Notebook - Size: 103 MB - Last synced at: 11 days ago - Pushed at: 8 months ago - Stars: 1 - Forks: 0

praj2408/End-to-end-LLM-and-image-model-application-using-Gemini-Pro

Gemini Pro, your do-it-all AI tool, translates languages, sparks creativity, and answers questions, all while efficiently running on devices from phones to data centers, making it accessible for developers and businesses to unlock AI's potential.

Language: Python - Size: 6.84 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0