An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: mllm

yuecao0119/MMFuser

The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in capturing complex image details by simply yet efficiently integrating multi-layer features from ViTs.

Language: Python - Size: 22.2 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 26 - Forks: 2

xyliu-cs/ConflictVIS

Repository for ConflictVIS benchmark

Size: 587 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

ZebangCheng/Emotion-LLaMA

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

Language: Python - Size: 12.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 95 - Forks: 9

zjunlp/Deco

MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation

Language: Python - Size: 17.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 14 - Forks: 0

claws-lab/MMSoc

We introduce MM-Soc, a comprehensive benchmark designed to evaluate MLLMs' understanding of multimodal social media content.

Language: Python - Size: 12.7 KB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

alexander-moore/vlm

Composition of Multimodal Language Models From Scratch

Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 4 - Forks: 0

dvlab-research/LLMGA

This project is the official implementation of 'LLMGA: Multimodal Large Language Model based Generation Assistant', ECCV2024 Oral

Language: Python - Size: 15.1 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 431 - Forks: 29

med-air/PICG2scoring

[MICCAI'24] Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

Language: Python - Size: 445 KB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

sterzhang/image-textualization

Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions

Language: Python - Size: 180 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 120 - Forks: 7

BAAI-DCAI/Bunny

A family of lightweight multimodal models.

Language: Python - Size: 28.9 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 797 - Forks: 61

Hon-Wong/Elysium

[ECCV2024] Elysium: Exploring Object-level Perception in Videos via MLLM

Language: Python - Size: 112 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 29 - Forks: 1

BAAI-DCAI/DataOptim

A collection of visual instruction tuning datasets.

Language: Python - Size: 51.8 KB - Last synced at: 9 months ago - Pushed at: about 1 year ago - Stars: 72 - Forks: 3

VisualWebBench/VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language: Python - Size: 3.17 MB - Last synced at: 10 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 1

gokayfem/ComfyUI_VLM_nodes

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

Language: Python - Size: 285 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 235 - Forks: 14

360CVGroup/SEEChat

Multimodal chatbot with computer vision capabilities integrated

Language: Python - Size: 7.92 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 94 - Forks: 8

NorbertDDD/AISurveyPapers

Large Visual Language Model(LVLM), Large Language Model(LLM), Multimodal Large Language Model(MLLM), Alignment, Agent, AI System, Survey

Size: 52.7 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 11 - Forks: 0

zzq2000/MIKO

MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discover

Language: Python - Size: 111 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 16 - Forks: 0

CircleRadon/Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language: Python - Size: 23.8 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 665 - Forks: 38

xirui-li/attacks-on-LLMs

Awesome list for attacks on large language models.

Size: 83 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 1 - Forks: 0

isLinXu/MLLM-Research-Learn

Conducting learning and research on MLLM based on the MME rankings.

Size: 5.57 MB - Last synced at: 1 day ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

X-PLUG/mPLUG-HalOwl

mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating

Language: Python - Size: 13.9 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 31 - Forks: 0

kassy11/Awesome_NLP_PaperList

🤖A list of PaperList of NLP related papers on Github

Size: 1.92 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Ahnsun/merlin

Merlin: Empowering Multimodal LLMs with Foresight Minds

Size: 967 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 0