GitHub topics: mllm
yuecao0119/MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in capturing complex image details by simply yet efficiently integrating multi-layer features from ViTs.
Language: Python - Size: 22.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 26 - Forks: 2

xyliu-cs/ConflictVIS
Repository for ConflictVIS benchmark
Size: 587 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

ZebangCheng/Emotion-LLaMA
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Language: Python - Size: 12.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 95 - Forks: 9

zjunlp/Deco
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Language: Python - Size: 17.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 14 - Forks: 0

claws-lab/MMSoc
We introduce MM-Soc, a comprehensive benchmark designed to evaluate MLLMs' understanding of multimodal social media content.
Language: Python - Size: 12.7 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

alexander-moore/vlm
Composition of Multimodal Language Models From Scratch
Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

dvlab-research/LLMGA
This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral
Language: Python - Size: 15.1 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 431 - Forks: 29

med-air/PICG2scoring
[MICCAI'24] Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring
Language: Python - Size: 445 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

sterzhang/image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions
Language: Python - Size: 180 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 120 - Forks: 7

BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language: Python - Size: 28.9 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 797 - Forks: 61

Hon-Wong/Elysium
[ECCV2024] Elysium: Exploring Object-level Perception in Videos via MLLM
Language: Python - Size: 112 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 29 - Forks: 1

BAAI-DCAI/DataOptim
A collection of visual instruction tuning datasets.
Language: Python - Size: 51.8 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 72 - Forks: 3

VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language: Python - Size: 3.17 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 1

gokayfem/ComfyUI_VLM_nodes
Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation
Language: Python - Size: 285 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 235 - Forks: 14

360CVGroup/SEEChat
Multimodal chatbot with computer vision capabilities integrated
Language: Python - Size: 7.92 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 94 - Forks: 8

NorbertDDD/AISurveyPapers
Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey
Size: 52.7 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 11 - Forks: 0

zzq2000/MIKO
MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover
Language: Python - Size: 111 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

CircleRadon/Osprey
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Language: Python - Size: 23.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 665 - Forks: 38

xirui-li/attacks-on-LLMs
Awesome list for attacks on large language models.
Size: 83 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

isLinXu/MLLM-Research-Learn
Conducting learning and research on MLLM based on the MME rankings.
Size: 5.57 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

X-PLUG/mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
Language: Python - Size: 13.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 0

kassy11/Awesome_NLP_PaperList
🤖A list of PaperList of NLP related papers on Github
Size: 1.92 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Ahnsun/merlin
Merlin: Empowering Multimodal LLMs with Foresight Minds
Size: 967 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 0
