GitHub topics: mllm
atfortes/Awesome-LLM-Reasoning
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
Size: 425 KB - Last synced at: about 4 hours ago - Pushed at: about 1 month ago - Stars: 2,986 - Forks: 169

mindspore-lab/mindway
the way -> '道' ; focus on multimodal large language model mllm
Language: Python - Size: 532 KB - Last synced at: about 15 hours ago - Pushed at: about 23 hours ago - Stars: 4 - Forks: 14

FoundationVision/Groma
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Language: Python - Size: 13.5 MB - Last synced at: about 14 hours ago - Pushed at: 11 months ago - Stars: 560 - Forks: 44

VARGPT-family/VARGPT
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Language: Python - Size: 5.72 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 306 - Forks: 14

pipixin321/Awesome-Video-MLLMs
:fire: :fire: :fire: Awesome MLLMs/Benchmarks for Short/Long/Streaming Video Understanding :video_camera:
Size: 6.84 KB - Last synced at: about 15 hours ago - Pushed at: 3 months ago - Stars: 15 - Forks: 1

jiazhen-code/PhD
[CVPR25] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced evaluation modes. The dataset includes extensive contextual descriptions, counterintuitive images, and clear indicators of hallucination items.
Size: 28 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 14 - Forks: 0

jingyi0000/R1-VL
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Size: 2.26 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 209 - Forks: 0

simular-ai/Agent-S
Agent S: an open agentic framework that uses computers like a human
Language: Python - Size: 38.9 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2,308 - Forks: 249

aidayang/MagicQuill-OneClick
MagicQuill智能交互式图像编辑软件免安装一键启动整合包
Size: 75.2 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

Hon-Wong/VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
Language: Python - Size: 9.08 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 103 - Forks: 6

The-Martyr/Awesome-Multimodal-Reasoning
Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models
Size: 60.5 KB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 19 - Forks: 0

ant-research/MagicQuill
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
Language: Python - Size: 42.6 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 3,281 - Forks: 331

coderonion/awesome-llm-and-aigc
🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.
Size: 266 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 657 - Forks: 59

YingqingHe/Awesome-LLMs-meet-Multimodal-Generation
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Language: HTML - Size: 12.7 MB - Last synced at: 5 days ago - Pushed at: 16 days ago - Stars: 455 - Forks: 26

microsoft/eureka-ml-insights
A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
Language: Python - Size: 20.1 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 121 - Forks: 20

taco-group/OpenEMMA
OpenEMMA, a permissively licensed open source "reproduction" of Waymo’s EMMA model.
Language: Python - Size: 65.3 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 613 - Forks: 77

Atomic-man007/Awesome_Multimodel_LLM
Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Models (MLLM). It covers datasets, tuning techniques, in-context learning, visual reasoning, foundational models, and more. Stay updated with the latest advancement.
Size: 2.64 MB - Last synced at: 4 days ago - Pushed at: about 1 month ago - Stars: 315 - Forks: 21

NVlabs/EAGLE
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
Language: Python - Size: 13.9 MB - Last synced at: 5 days ago - Pushed at: 11 days ago - Stars: 655 - Forks: 39

JackYFL/awesome-VLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
Size: 893 KB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 86 - Forks: 8

TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
Language: Python - Size: 3.8 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 171 - Forks: 7

coderonion/awesome-yolo-object-detection
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
Size: 387 KB - Last synced at: 6 days ago - Pushed at: 10 days ago - Stars: 1,448 - Forks: 202

baaivision/EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
Language: Python - Size: 6.95 MB - Last synced at: 7 days ago - Pushed at: about 2 months ago - Stars: 320 - Forks: 8

TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]
Language: Python - Size: 83.6 MB - Last synced at: 8 days ago - Pushed at: 27 days ago - Stars: 211 - Forks: 20

X-PLUG/mPLUG-DocOwl
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Language: Python - Size: 105 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 2,151 - Forks: 128

InternLM/InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Language: Python - Size: 199 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 2,805 - Forks: 171

X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Language: Python - Size: 383 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 4,034 - Forks: 404

NExT-GPT/NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
Language: Python - Size: 127 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 3,480 - Forks: 349

cambrian-mllm/cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Language: Python - Size: 1.99 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 1,885 - Forks: 129

VARGPT-family/VARGPT-v1.1
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Language: Python - Size: 19.7 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 116 - Forks: 6

aimagelab/ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Language: Python - Size: 7.17 MB - Last synced at: 11 days ago - Pushed at: 20 days ago - Stars: 21 - Forks: 0

magic-research/Sa2VA
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Language: Python - Size: 68.5 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 1,013 - Forks: 65

microsoft/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Language: Python - Size: 66.4 MB - Last synced at: 12 days ago - Pushed at: about 2 months ago - Stars: 21,029 - Forks: 2,616

Ruiyang-061X/Awesome-MLLM-Reasoning
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
Size: 7.81 KB - Last synced at: 8 days ago - Pushed at: 2 months ago - Stars: 4 - Forks: 0

VITA-MLLM/Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models
Language: Python - Size: 21.2 MB - Last synced at: 12 days ago - Pushed at: 4 months ago - Stars: 635 - Forks: 31

ExplainableML/vla-gender-bias
[ICLR 2025] Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Language: Python - Size: 3.11 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

Ruiyang-061X/Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
Size: 381 KB - Last synced at: 10 days ago - Pushed at: 18 days ago - Stars: 42 - Forks: 0

pds-dpo/pds-dpo
Official GitHub repository of PDS-DPO: Multimodal Preference Data Synthetic Alignment with Reward Model
Language: Python - Size: 6.71 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 6 - Forks: 0

BUAADreamer/MLLM-Finetuning-Demo
使用LLaMA-Factory微调多模态大语言模型的示例代码 Demo of Finetuning Multimodal LLM with LLaMA-Factory
Language: Python - Size: 61.5 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 32 - Forks: 2

ChocoWu/Any2Caption
This is the project webpage for 'Any2Caption'.
Size: 4.51 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1 - Forks: 0

SkyworkAI/Skywork-R1V
Pioneering Multimodal Reasoning with CoT
Language: Python - Size: 30.5 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 1,365 - Forks: 136

thu-ml/MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust, NeurIPS 2024 Track Datasets and Benchmarks)
Language: Python - Size: 15.8 MB - Last synced at: 15 days ago - Pushed at: 25 days ago - Stars: 145 - Forks: 10

CCAI-Lab/Awesome-GUI-Agents
A curated collection of resources, tools, and frameworks for developing GUI Agents.
Size: 29.4 MB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 10 - Forks: 0

MING-ZCH/CII-Bench
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Language: Python - Size: 135 MB - Last synced at: 14 days ago - Pushed at: about 2 months ago - Stars: 11 - Forks: 1

FoundationVision/GenerateU
[CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection
Language: Python - Size: 14.4 MB - Last synced at: 14 days ago - Pushed at: 22 days ago - Stars: 167 - Forks: 7

AIDC-AI/Wings
The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]
Language: Python - Size: 2.85 MB - Last synced at: 18 days ago - Pushed at: 4 months ago - Stars: 17 - Forks: 1

zjrwtx/SFT-data-builder
利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data
Language: JavaScript - Size: 502 KB - Last synced at: 15 days ago - Pushed at: 5 months ago - Stars: 154 - Forks: 15

FreedomIntelligence/TRIM
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their performance.
Language: Python - Size: 26.9 MB - Last synced at: 20 days ago - Pushed at: 4 months ago - Stars: 12 - Forks: 0

NiuTrans/Vision-LLM-Alignment
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
Language: Python - Size: 153 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 104 - Forks: 8

DAMO-NLP-SG/VideoRefer
[CVPR 2025] The code for "VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM"
Language: Python - Size: 130 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 176 - Forks: 9

VITA-MLLM/Long-VITA
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Language: Python - Size: 3.85 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 265 - Forks: 29

WILLOSCAR/Awesome-HCI-LLM
Awesome-HCI (Ubiquitous, LLM, MLLM, Agent, RAG, Embodied-AI)
Size: 14.6 KB - Last synced at: 1 day ago - Pushed at: about 2 months ago - Stars: 5 - Forks: 0

hewei2001/ReachQA
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
Language: Python - Size: 9.82 MB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 51 - Forks: 0

manycore-research/SpatialLM
SpatialLM: Large Language Model for Spatial Understanding
Language: Python - Size: 6.22 MB - Last synced at: 26 days ago - Pushed at: 30 days ago - Stars: 2,182 - Forks: 145

tychenjiajun/exif-ai
A Node.js CLI and library that uses OpenAI, Ollama, ZhipuAI, Google Gemini or Coze to write AI-generated image descriptions and/or tags to EXIF metadata by its content.
Language: TypeScript - Size: 14.7 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 14 - Forks: 4

nidhiyashwanth/SpatialLM
Trying out SpatialLM (SpatialLM: Large Language Model for Spatial Understanding). Impressed with results 💖
Language: Jupyter Notebook - Size: 41 KB - Last synced at: 16 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

eternal8080/MV-MATH
Description for MV-MATH
Language: Python - Size: 2.47 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 7 - Forks: 0

sugarandgugu/Awesome-GUIAgent-Perception
Awesome-LLM: a curated list of GUIAgent's Perception
Size: 12.7 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

baaivision/DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
Language: Python - Size: 18.1 MB - Last synced at: 14 days ago - Pushed at: 4 months ago - Stars: 137 - Forks: 1

OrvilleX/MachineLearning
本项目以应用为主出发,结合了从基础的机器学习、深度学习到目标检测以及目前最新的大模型,采用目前成熟的 第三方库、开源预训练模型以及相关论文的最新技术,目的是记录学习的过程同时也进行分享以供更多人可以直接进行使用。
Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 11 days ago - Pushed at: about 2 months ago - Stars: 66 - Forks: 22

BUAADreamer/Chinese-LLaVA-Med
中文医学多模态大模型 Large Chinese Language-and-Vision Assistant for BioMedicine
Language: Python - Size: 2.26 MB - Last synced at: 16 days ago - Pushed at: 11 months ago - Stars: 76 - Forks: 4

vbdi/casp
[CVPR 2025] CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Language: Python - Size: 361 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

cilabuniba/i-dream-my-painting
[WACV 2025] I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting
Language: Jupyter Notebook - Size: 58.7 MB - Last synced at: 17 days ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

Coobiw/MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.
Language: Jupyter Notebook - Size: 73.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 420 - Forks: 23

itsvaibhav01/immune-web
Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Language: JavaScript - Size: 97.9 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

lerogo/MMGenBench
Official repository of MMGenBench
Language: Python - Size: 19.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 119 - Forks: 5

otroshi/FoundationModelsBiometrics
Foundation Models and Biometrics: A Survey and Outlook
Size: 0 Bytes - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1 - Forks: 0

KwaiVGI/Uniaa
Unified Multi-modal IAA Baseline and Benchmark
Language: Python - Size: 9.12 MB - Last synced at: about 1 hour ago - Pushed at: 7 months ago - Stars: 74 - Forks: 5

USC-GVL/PhysBench
[ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding>
Language: Python - Size: 14.3 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 33 - Forks: 1

bigai-nlco/VideoTGB
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Language: Python - Size: 51.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 29 - Forks: 2

MSR3D/MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
Language: Python - Size: 11.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 37 - Forks: 2

lucasjinreal/Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
Language: Python - Size: 1.13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 129 - Forks: 14

wendell0218/GVA-Survey
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Size: 6.16 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 1

chunhuizng/mllm-video-captioner
We use RL to train a SOTA MLLM captioner.
Language: Python - Size: 1.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 0

Thisisus7/ING-VP
An Interactive Game-based Vision Planning benchmark
Language: Python - Size: 2.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 8 - Forks: 0

DistRL-lab/distrl-open
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Language: Python - Size: 782 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 17 - Forks: 0

Now-Join-Us/OmniEvalKit Fork of AIDC-AI/M3Bench
The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"
Language: Python - Size: 3.82 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 13 - Forks: 2

WebPAI/Interaction2Code
Interaction2Code: How Far Are We From Automatic Interactive Webpage Generation?
Language: JavaScript - Size: 73.4 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 28 - Forks: 1

taco-group/Re-Align
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Language: Python - Size: 18.5 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 9 - Forks: 0

X-PLUG/mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Language: Python - Size: 2.36 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 19

zhipeixu/FakeShield
🔥 [ICLR 2025] FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Language: Python - Size: 1.79 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 120 - Forks: 11

X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Language: Python - Size: 15.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 292 - Forks: 11

XuYunqiu/MC-Bench
official repo of "MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs"
Language: Python - Size: 47.3 MB - Last synced at: about 2 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

gyunggyung/OpenMLLM Fork of ggerganov/llama.cpp
Open Source + Multilingual MLLM + Fine-tuning + Distillation + More efficient models and learning + ?
Language: C++ - Size: 24.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 18 - Forks: 5

IDEA-Research/ChatRex
Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding
Language: Python - Size: 8.8 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 124 - Forks: 3

cuiyuheng/MiniCPM-V Fork of OpenBMB/MiniCPM-o
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Size: 301 MB - Last synced at: 25 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

Raymond-Qiancx/Awesome-Multimodal-Machine-Learning-Papers
Taxonomy and listing of current powerful studies in Advanced Multimodal Machine Learning.
Size: 690 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 1

BUAADreamer/Qwen2-VL-History
Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums
Size: 73.8 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 6 - Forks: 2

CircleRadon/TokenPacker
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Language: Python - Size: 40.8 MB - Last synced at: 3 months ago - Pushed at: 4 months ago - Stars: 237 - Forks: 9

automatika-robotics/roboml
RoboML is an aggregator package written for quickly deploying open source ML models for robotics use cases
Language: Python - Size: 197 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 4 - Forks: 0

lt-asset/REPOCOD
Can Language Models Replace Programmers? RepoCod Says ‘Not Yet’ - by Shanchao Liang and Yiran Hu and Nan Jiang and Lin Tan
Language: Python - Size: 3.95 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 15 - Forks: 1

hemangjoshi37a/FactoryAIOptimize
AI-Powered Multi-Camera Vision LLM System for Factory Optimization
Size: 9.77 KB - Last synced at: about 1 month ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

SkyworkAI/Vitron
NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Language: Python - Size: 667 MB - Last synced at: 4 months ago - Pushed at: 6 months ago - Stars: 426 - Forks: 24

parsee-ai/parsee-datasets
Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai
Language: Jupyter Notebook - Size: 167 MB - Last synced at: 3 months ago - Pushed at: 11 months ago - Stars: 18 - Forks: 1

showlab/VisInContext
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Language: Python - Size: 1010 KB - Last synced at: 18 days ago - Pushed at: 6 months ago - Stars: 14 - Forks: 2

Gnonymous/Simple-MLLM
This is a simple example of deploying Multimodal Large Model (MLLM) locally with support for multiple modal inputs including image, text and voice (being updated)这是一个在本地部署的简易多模态大模型(MLLM)的实例,支持包括图片、文字以及语音(正在更新)多种模态的输入
Language: Python - Size: 1000 Bytes - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

waltonfuture/Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
Language: Python - Size: 39.1 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 30 - Forks: 2

xirui-li/MOSSBench
An implementation for MLLM oversensitivity evaluatio
Language: JavaScript - Size: 479 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

turningpoint-ai/MOSSBench Fork of xirui-li/MOSSBench
This is the official implementation (code, data) of the paper "MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?""
Language: JavaScript - Size: 479 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

yuecao0119/MMInstruct
The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". The MMInstruct dataset includes 973K instructions from 24 domains and four instruction types.
Language: Python - Size: 1.61 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 31 - Forks: 1

bz-lab/AUITestAgent
AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification.
Size: 368 MB - Last synced at: 5 months ago - Pushed at: 9 months ago - Stars: 148 - Forks: 10
