GitHub topics: mllm

Repositories

atfortes/Awesome-LLM-Reasoning

Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓

Size: 425 KB - Last synced at: about 4 hours ago - Pushed at: about 1 month ago - Stars: 2,986 - Forks: 169

mindspore-lab/mindway

the way -> '道' ; focus on multimodal large language model mllm

Language: Python - Size: 532 KB - Last synced at: about 15 hours ago - Pushed at: about 23 hours ago - Stars: 4 - Forks: 14

FoundationVision/Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

Language: Python - Size: 13.5 MB - Last synced at: about 14 hours ago - Pushed at: 11 months ago - Stars: 560 - Forks: 44

VARGPT-family/VARGPT

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

Language: Python - Size: 5.72 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 306 - Forks: 14

pipixin321/Awesome-Video-MLLMs

:fire: :fire: :fire: Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding :video_camera:

Size: 6.84 KB - Last synced at: about 15 hours ago - Pushed at: 3 months ago - Stars: 15 - Forks: 1

[CVPR25] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation modes. The dataset includes extensive contextual descriptions, counterintuitive images, and clear indicators of hallucination items.

Size: 28 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 14 - Forks: 0

jingyi0000/R1-VL

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Size: 2.26 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 209 - Forks: 0

simular-ai/Agent-S

Agent S: an open agentic framework that uses computers like a human

Language: Python - Size: 38.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,308 - Forks: 249

aidayang/MagicQuill-OneClick

MagicQuill智能交互式图像编辑软件免安装一键启动整合包

Size: 75.2 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Hon-Wong/VoRA

[Fully open] [Encoder-free MLLM] Vision as LoRA

Language: Python - Size: 9.08 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 103 - Forks: 6

The-Martyr/Awesome-Multimodal-Reasoning

Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models

Size: 60.5 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 0

ant-research/MagicQuill

[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

Language: Python - Size: 42.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3,281 - Forks: 331

coderonion/awesome-llm-and-aigc

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

Size: 266 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 657 - Forks: 59

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

Language: HTML - Size: 12.7 MB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 455 - Forks: 26

microsoft/eureka-ml-insights

A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.

Language: Python - Size: 20.1 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 121 - Forks: 20

taco-group/OpenEMMA

OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.

Language: Python - Size: 65.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 613 - Forks: 77

Atomic-man007/Awesome_Multimodel_LLM

Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.

Size: 2.64 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 315 - Forks: 21

NVlabs/EAGLE

Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs

Language: Python - Size: 13.9 MB - Last synced at: 5 days ago - Pushed at: 11 days ago - Stars: 655 - Forks: 39

JackYFL/awesome-VLLMs

This repository collects papers on VLLM applications. We will update new papers irregularly.

Size: 893 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 86 - Forks: 8

TideDra/VL-RLHF

A RLHF Infrastructure for Vision-Language Models

Language: Python - Size: 3.8 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 171 - Forks: 7

coderonion/awesome-yolo-object-detection

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

Size: 387 KB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 1,448 - Forks: 202

baaivision/EVE

EVE Series: Encoder-Free Vision-Language Models from BAAI

Language: Python - Size: 6.95 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 320 - Forks: 8

TIGER-AI-Lab/Mantis

Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]

Language: Python - Size: 83.6 MB - Last synced at: 8 days ago - Pushed at: 27 days ago - Stars: 211 - Forks: 20

X-PLUG/mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language: Python - Size: 105 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 2,151 - Forks: 128

InternLM/InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Language: Python - Size: 199 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 2,805 - Forks: 171

X-PLUG/MobileAgent

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Language: Python - Size: 383 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 4,034 - Forks: 404

NExT-GPT/NExT-GPT

Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model

Language: Python - Size: 127 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 3,480 - Forks: 349

cambrian-mllm/cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language: Python - Size: 1.99 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 1,885 - Forks: 129

VARGPT-family/VARGPT-v1.1

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning

Language: Python - Size: 19.7 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 116 - Forks: 6

aimagelab/ReflectiVA

[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering

Language: Python - Size: 7.17 MB - Last synced at: 11 days ago - Pushed at: 20 days ago - Stars: 21 - Forks: 0

magic-research/Sa2VA

🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos

Language: Python - Size: 68.5 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1,013 - Forks: 65

microsoft/unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language: Python - Size: 66.4 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 21,029 - Forks: 2,616

Ruiyang-061X/Awesome-MLLM-Reasoning

📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.

Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

VITA-MLLM/Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models

Language: Python - Size: 21.2 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 635 - Forks: 31

ExplainableML/vla-gender-bias

[ICLR 2025] Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)

Language: Python - Size: 3.11 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

Ruiyang-061X/Awesome-MLLM-Uncertainty

✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).

Size: 381 KB - Last synced at: 10 days ago - Pushed at: 18 days ago - Stars: 42 - Forks: 0

pds-dpo/pds-dpo

Official GitHub repository of PDS-DPO: Multimodal Preference Data Synthetic Alignment with Reward Model

Language: Python - Size: 6.71 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 6 - Forks: 0

BUAADreamer/MLLM-Finetuning-Demo

使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory

Language: Python - Size: 61.5 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 32 - Forks: 2

ChocoWu/Any2Caption

This is the project webpage for 'Any2Caption'.

Size: 4.51 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

SkyworkAI/Skywork-R1V

Pioneering Multimodal Reasoning with CoT

Language: Python - Size: 30.5 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,365 - Forks: 136

thu-ml/MMTrustEval

A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)

Language: Python - Size: 15.8 MB - Last synced at: 15 days ago - Pushed at: 25 days ago - Stars: 145 - Forks: 10

CCAI-Lab/Awesome-GUI-Agents

A curated collection of resources, tools, and frameworks for developing GUI Agents.

Size: 29.4 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 10 - Forks: 0

MING-ZCH/CII-Bench

Can MLLMs Understand the Deep Implication Behind Chinese Images?

Language: Python - Size: 135 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 1

FoundationVision/GenerateU

[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection

Language: Python - Size: 14.4 MB - Last synced at: 14 days ago - Pushed at: 22 days ago - Stars: 167 - Forks: 7

AIDC-AI/Wings

The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]

Language: Python - Size: 2.85 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 17 - Forks: 1

zjrwtx/SFT-data-builder

利用免费的大模型api来结合你的私域数据来生成sft训练数据（妥妥白嫖）支持llamafactory等工具的训练数据格式synthetic data

Language: JavaScript - Size: 502 KB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 154 - Forks: 15

FreedomIntelligence/TRIM

We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their performance.

Language: Python - Size: 26.9 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

NiuTrans/Vision-LLM-Alignment

This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

Language: Python - Size: 153 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 104 - Forks: 8

DAMO-NLP-SG/VideoRefer

[CVPR 2025] The code for "VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM"

Language: Python - Size: 130 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 176 - Forks: 9

VITA-MLLM/Long-VITA

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Language: Python - Size: 3.85 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 265 - Forks: 29

WILLOSCAR/Awesome-HCI-LLM

Awesome-HCI （Ubiquitous, LLM, MLLM, Agent, RAG, Embodied-AI)

Size: 14.6 KB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

hewei2001/ReachQA

Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"

Language: Python - Size: 9.82 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 51 - Forks: 0

manycore-research/SpatialLM

SpatialLM: Large Language Model for Spatial Understanding

Language: Python - Size: 6.22 MB - Last synced at: 26 days ago - Pushed at: 30 days ago - Stars: 2,182 - Forks: 145

tychenjiajun/exif-ai

A Node.js CLI and library that uses OpenAI, Ollama, ZhipuAI, Google Gemini or Coze to write AI-generated image descriptions and/or tags to EXIF metadata by its content.

Language: TypeScript - Size: 14.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 14 - Forks: 4

nidhiyashwanth/SpatialLM

Trying out SpatialLM (SpatialLM: Large Language Model for Spatial Understanding). Impressed with results 💖

Language: Jupyter Notebook - Size: 41 KB - Last synced at: 16 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

eternal8080/MV-MATH

Description for MV-MATH

Language: Python - Size: 2.47 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 7 - Forks: 0

sugarandgugu/Awesome-GUIAgent-Perception

Awesome-LLM: a curated list of GUIAgent's Perception

Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

baaivision/DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Language: Python - Size: 18.1 MB - Last synced at: 14 days ago - Pushed at: 4 months ago - Stars: 137 - Forks: 1

OrvilleX/MachineLearning

本项目以应用为主出发，结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型，采用目前成熟的第三方库、开源预训练模型以及相关论文的最新技术，目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。

Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 66 - Forks: 22

BUAADreamer/Chinese-LLaVA-Med

中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine

Language: Python - Size: 2.26 MB - Last synced at: 16 days ago - Pushed at: 11 months ago - Stars: 76 - Forks: 4

vbdi/casp

[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Language: Python - Size: 361 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

cilabuniba/i-dream-my-painting

[WACV 2025] I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Language: Jupyter Notebook - Size: 58.7 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

Coobiw/MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

Language: Jupyter Notebook - Size: 73.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 420 - Forks: 23

itsvaibhav01/immune-web

Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment

Language: JavaScript - Size: 97.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lerogo/MMGenBench

Official repository of MMGenBench

Language: Python - Size: 19.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 119 - Forks: 5

otroshi/FoundationModelsBiometrics

Foundation Models and Biometrics: A Survey and Outlook

Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

KwaiVGI/Uniaa

Unified Multi-modal IAA Baseline and Benchmark

Language: Python - Size: 9.12 MB - Last synced at: about 1 hour ago - Pushed at: 7 months ago - Stars: 74 - Forks: 5

USC-GVL/PhysBench

[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding>

Language: Python - Size: 14.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 33 - Forks: 1

bigai-nlco/VideoTGB

[EMNLP 2024] A Video Chat Agent with Temporal Prior

Language: Python - Size: 51.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 29 - Forks: 2

MSR3D/MSR3D

[NeurIPS 2024] Official code repository for MSR3D paper

Language: Python - Size: 11.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 37 - Forks: 2

lucasjinreal/Namo-R1

A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.

Language: Python - Size: 1.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 129 - Forks: 14

wendell0218/GVA-Survey

Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms

Size: 6.16 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 1

chunhuizng/mllm-video-captioner

We use RL to train a SOTA MLLM captioner.

Language: Python - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

Thisisus7/ING-VP

An Interactive Game-based Vision Planning benchmark

Language: Python - Size: 2.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

DistRL-lab/distrl-open

DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents

Language: Python - Size: 782 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 0

Now-Join-Us/OmniEvalKit Fork of AIDC-AI/M3Bench

The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"

Language: Python - Size: 3.82 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 13 - Forks: 2

WebPAI/Interaction2Code

Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?

Language: JavaScript - Size: 73.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 28 - Forks: 1

taco-group/Re-Align

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

Language: Python - Size: 18.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 0

X-PLUG/mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

Language: Python - Size: 2.36 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 19

zhipeixu/FakeShield

🔥 [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models

Language: Python - Size: 1.79 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 120 - Forks: 11

X-PLUG/Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

Language: Python - Size: 15.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 292 - Forks: 11

XuYunqiu/MC-Bench

official repo of "MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs"

Language: Python - Size: 47.3 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

gyunggyung/OpenMLLM Fork of ggerganov/llama.cpp

Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?

Language: C++ - Size: 24.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 18 - Forks: 5

IDEA-Research/ChatRex

Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding

Language: Python - Size: 8.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 124 - Forks: 3

cuiyuheng/MiniCPM-V Fork of OpenBMB/MiniCPM-o

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Size: 301 MB - Last synced at: 25 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Raymond-Qiancx/Awesome-Multimodal-Machine-Learning-Papers

Taxonomy and listing of current powerful studies in Advanced Multimodal Machine Learning.

Size: 690 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

BUAADreamer/Qwen2-VL-History

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

Size: 73.8 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 6 - Forks: 2

CircleRadon/TokenPacker

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Language: Python - Size: 40.8 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 237 - Forks: 9

automatika-robotics/roboml

RoboML is an aggregator package written for quickly deploying open source ML models for robotics use cases

Language: Python - Size: 197 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

lt-asset/REPOCOD

Can Language Models Replace Programmers? RepoCod Says ‘Not Yet’ - by Shanchao Liang and Yiran Hu and Nan Jiang and Lin Tan

Language: Python - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 15 - Forks: 1

hemangjoshi37a/FactoryAIOptimize

AI-Powered Multi-Camera Vision LLM System for Factory Optimization

Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SkyworkAI/Vitron

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Language: Python - Size: 667 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 426 - Forks: 24

parsee-ai/parsee-datasets

Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai

Language: Jupyter Notebook - Size: 167 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 18 - Forks: 1

showlab/VisInContext

Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Language: Python - Size: 1010 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 14 - Forks: 2

Gnonymous/Simple-MLLM

This is a simple example of deploying Multimodal Large Model (MLLM) locally with support for multiple modal inputs including image, text and voice (being updated)这是一个在本地部署的简易多模态大模型（MLLM）的实例，支持包括图片、文字以及语音（正在更新）多种模态的输入

Language: Python - Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

waltonfuture/Diff-eRank

[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models

Language: Python - Size: 39.1 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 30 - Forks: 2

xirui-li/MOSSBench

An implementation for MLLM oversensitivity evaluatio

Language: JavaScript - Size: 479 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

turningpoint-ai/MOSSBench Fork of xirui-li/MOSSBench

This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""

Language: JavaScript - Size: 479 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

yuecao0119/MMInstruct

The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.

Language: Python - Size: 1.61 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 31 - Forks: 1

bz-lab/AUITestAgent

AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.

Size: 368 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 148 - Forks: 10

Related Keywords

mllm 123 llm 54 vlm 29 multimodal 23 multimodal-large-language-models 22 large-language-models 16 benchmark 11 vision-language-model 7 ai 7 machine-learning 7 alignment 7 llava 7 lvlm 6 agent 6 dataset 6 foundation-models 6 chatgpt 6 lmm 6 aigc 5 gpt 5 survey 5 llms 5 multi-modal 5 hallucination 4 datasets 4 computer-vision 4 multimodality 4 instruction-tuning 4 gpt-4 4 gui 4 dpo 4 vqa 4 awesome 4 llama 4 agents 3 o1 3 llama-factory 3 qwen 3 transformers 3 reasoning 3 automation 3 supervised-finetuning 3 large-language-model 3 artificial-intelligence 3 vision 3 video 3 object-detection 3 natural-language-processing 3 rlhf 3 deepseek-r1 3 vllm 3 deepseek 3 cot 3 nlp 3 chain-of-thought 3 visual-instruction-tuning 3 awesome-list 3 image-editing 3 chinese 3 deep-learning 3 reinforcement-learning 3 r1 3 fine-tuning 3 evaluation 3 video-understanding 3 vision-and-language 3 safety 2 multimodal-deep-learning 2 embodied-ai 2 robotics 2 attack 2 oversensitivity 2 gpt4 2 rag 2 point-clouds 2 multi-image-understanding 2 vision-language-models 2 clip 2 scene-understanding 2 spatial-intelligence 2 siglip 2 social-media 2 jailbreak 2 privacy 2 post-training 2 image-captioning 2 video-captioning 2 diffusion 2 pretraining 2 huggingface-datasets 2 hallucinations 2 hallucination-mitigation 2 openai 2 synthetic-data 2 multimodal-pretraining 2 video-question-answering 2 chatbot 2 multimodal-agent 2 mobile 2 gpt4v 2