An open API service providing repository metadata for many open source software ecosystems.

Topic: "multimodal-large-language-models"

Lzcstan/DrugLAMP

A PyTorch-based system for highly accurate drug-target interaction predictions utilizing multi-modal large language models to discern structural affinities in drug-target pairs.

Language: Python - Size: 128 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 0

xyz9911/FLAME

[AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"

Language: Python - Size: 8.57 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 30 - Forks: 3

bigai-nlco/VideoTGB

[EMNLP 2024] A Video Chat Agent with Temporal Prior

Language: Python - Size: 51.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 29 - Forks: 2

Czi24/Awesome-MLLM-LLM-Colab

Happy experimenting with MLLM and LLM models!

Language: Jupyter Notebook - Size: 17.9 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 29 - Forks: 2

MileBench/MileBench

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

Language: Python - Size: 3.52 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 1

AlignGPT-VL/AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Language: Python - Size: 1.97 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 3

patrick-tssn/VideoHallucer

VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)

Language: Python - Size: 21.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 0

declare-lab/MM-InstructEval

This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimodal content comprehension tasks.

Language: Python - Size: 32.6 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 1

inst-it/inst-it

Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"

Language: Python - Size: 2.66 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 27 - Forks: 0

zjunlp/InstructCell

A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following

Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 27 - Forks: 5

SuperBruceJia/Awesome-Large-Vision-Language-Model

Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model

Size: 103 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 27 - Forks: 3

Wild-Cooperation-Hub/Awesome-MLLM-Reasoning-Benchmarks

A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.

Size: 89.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 26 - Forks: 2

Video-Bench/Video-Bench

Video Generation Benchmark

Language: Python - Size: 10.1 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 25 - Forks: 3

The-Martyr/CausalMM

[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality

Language: Python - Size: 7.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 25 - Forks: 2

xmed-lab/MedRegA

[ICLR 2025] MedRegA: Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks

Language: Python - Size: 5.61 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 25 - Forks: 1

mlvlab/LLaMo

Official Implementation (Pytorch) of the "LLaMo: Large Language Model-based Molecular Graph Assistant", NeurIPS 2024

Language: Python - Size: 44.9 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 25 - Forks: 1

philfung/computer-use

try Computer Use on your Mac with a few clicks

Language: Python - Size: 105 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 24 - Forks: 2

AstraZeneca/vlm

Official implementation for "Diffusion Instruction Tuning"

Language: Python - Size: 25.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 23 - Forks: 2

moucheng2017/SOP-LVM-ICL-Ensemble

[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding

Language: Python - Size: 834 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

VachanVY/Transfusion.torch

PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Language: Python - Size: 2.07 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 21 - Forks: 5

multimodal-ai-lab/DEFAME

Fact-checking system for textual and visual inputs.

Language: Python - Size: 30.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 20 - Forks: 4

SlytherinGe/RSTeller

Vision-Language Dataset for Remote Sensing

Language: Python - Size: 15.5 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 3

willxxy/ECG-Byte

[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling

Language: Python - Size: 28.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 18 - Forks: 0

vbdi/divprune

[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Language: Python - Size: 11 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 18 - Forks: 0

eric-ai-lab/MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Language: Python - Size: 1.47 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 18 - Forks: 1

AIDC-AI/Wings

The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]

Language: Python - Size: 2.85 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 17 - Forks: 1

ryota-komatsu/slp2025

音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料

Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 16 - Forks: 2

willxxy/ECG-Bench

A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)

Language: Python - Size: 6.81 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 15 - Forks: 2

multimindlab/multimind-sdk

Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.Star 🌟 if you like it!

Language: Python - Size: 46.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 15 - Forks: 1

Wang-ML-Lab/interpretable-foundation-models

[ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy Conceptual Explanations for Vision Foundation Models

Language: Python - Size: 52.7 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 3

gautierdag/plancraft

Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs

Language: Python - Size: 124 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 15 - Forks: 0

The-Martyr/Awesome-Modality-Priors-in-MLLMs

Latest Advances on Modality Priors in Multimodal Large Language Models

Size: 76.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 1

X-iZhang/Libra

[ACL 2025] ⚖️ Temporally-aware MLLM for Biomedical Radiology Analysis and Report Generation. Flexible toolkit with LLM backbone support, real-time validation, training resumption, and smart model saving.

Language: Python - Size: 13.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 14 - Forks: 1

mlvlab/VidChain

Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025

Language: Python - Size: 9.02 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

PanguIR/MRAGSurvey

A Survey of Multimodal Retrieval-Augmented Generation

Size: 4.92 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 11 - Forks: 1

rohit901/VANE-Bench

[NAACL'25] Contains code and documentation for our VANE-Bench paper.

Language: Python - Size: 38.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1

somvy/multimodal_unlearning

Experiments for our CLEAR benchmark of unlearning methods in a multimodal setup

Language: Python - Size: 112 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 10 - Forks: 0

Manchery/awesome-visual-tokenizer

[WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful.

Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 10 - Forks: 0

hpc203/Chinese-CLIP-opencv-onnxrun

使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序

Language: C++ - Size: 4.03 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9 - Forks: 1

declare-lab/Auto-Scaling

[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"

Language: Jupyter Notebook - Size: 67.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 1

eric-ai-lab/MSSBench

[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"

Language: Python - Size: 1.51 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

hrlics/LITE

[COLM 2024] LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models

Language: Python - Size: 1.31 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 9 - Forks: 0

OmniMMI/OmniMMI

[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts

Language: Python - Size: 25.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 8 - Forks: 0

mashijie1028/GenHancer

A post-training method to enhance CLIP's fine-grained visual representations with generative models.

Language: Python - Size: 2.56 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

Z1zs/MMNeuron

Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our codes are borrowed from Tang's language specific neurons implementation and nrimsky's logit lens implementation.

Language: Python - Size: 28.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

BillChan226/MJ-Bench

Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"

Language: Jupyter Notebook - Size: 2.56 GB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

Hoar012/TDC-Video

Language: Python - Size: 5.75 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 7 - Forks: 0

CristianoPatricio/CBVLM

Code for the paper "CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification".

Language: Python - Size: 903 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

inFaaa/Evolver

[COLING 2025🔥] Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection

Language: Python - Size: 202 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 7 - Forks: 0

NKU-MetautoAI/awesome-large-vision-language-models

Advances in recent large vision language models (LVLMs)

Size: 32.7 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

msamprovalaki/Exploring-Multimodal-Large-Language-Models-for-Medical-Image-Captioning

This repository includes the code for my Master Thesis, which investigates the application of Multimodal Large Language Models (MLLMs) for medical image captioning

Language: Python - Size: 5.45 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 6 - Forks: 0

gaotiexinqu/V2P-Bench

V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction

Language: Python - Size: 26.8 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 0

jolibrain/colette

Search and interact locally with technical documents of any kind

Language: HTML - Size: 7.74 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 6 - Forks: 4

AceCHQ/MMIQ

This repo contains evaluation code for MM-IQ benchmark.

Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

BUAADreamer/Qwen2-VL-History

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

Size: 73.8 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 2

abdur75648/V-Zen

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources

Size: 5.86 KB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 6 - Forks: 3

zjr2000/REVERIE

[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Language: Python - Size: 1.12 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

GerrySant/multimodalhugs

MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.

Language: Python - Size: 4.38 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 5 - Forks: 2

RainBowLuoCS/Awesome-Unified-Multimodal-Understanding-and-Generation

📰 Must-read papers on Unified Multimodal Understanding and Generation (constantly updating 🤗).

Size: 11.7 KB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 5 - Forks: 0

aliencaocao/vlm-for-memes-aisg

Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

araobp/virtual-showroom

A virtual showroom with virtual promotional models (Chatbots) and a 240-degree screen for VR experiences with naked eyes.

Language: C# - Size: 1.33 GB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 5 - Forks: 0

emanalytic/MultiModal-E-Commerce-Customer-Support-Chatbot

Multimodal Customer Service Chatbot

Language: Python - Size: 154 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 4 - Forks: 1

Hazel-Heejeong-Nam/VAGUE

An official repository of "VAGUE: Visual Contexts Clarify Ambiguous Expressions"

Language: Python - Size: 3.07 MB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 4 - Forks: 0

AMD-AIG-AIMA/gpt-fast

The GPT-Fast for Multimodal Models on AMD GPUs

Language: Python - Size: 6.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

NotYuSheng/Multimodal-Large-Language-Model

Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.

Language: Python - Size: 7.37 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

defnecirci/MatSciTableExtract

Extracting structured materials science data from tables using LLMs

Language: Python - Size: 147 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

Gen-Verse/HermesFlow

HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation

Language: Python - Size: 1.91 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

alexander-moore/vlm

Composition of Multimodal Language Models From Scratch

Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

Vignesh010101/Intelligent-Health-LLM-System

An Intelligent Health LLM System for Personalized Medication Guidance and Support.

Language: Jupyter Notebook - Size: 620 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 3 - Forks: 0

LAMDA-Tabular/MMTU

The comprehensive MMTU Benchmark for testing the MLLMs' capabilkity in Tabular Understanding

Language: Python - Size: 1.34 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 0

xavier-yu114/Zoom-Refine

Zoom-Refine: Boosting High-Resolution Multimodal Understanding via Localized Zoom and Self-Refinement

Language: Python - Size: 3.64 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

nkkbr/ViCA

This is the official implementation of ViCA2 (Visuospatial Cognitive Assistant 2), a multimodal large language model designed for advanced visuospatial reasoning. The repository also provides training scripts for the original ViCA model.

Language: Python - Size: 4.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 3 - Forks: 0

yu-rp/Dimple

Dimple, a Discrete Diffusion Multimodal Large Language Model

Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

315386775/OpenPathoFoundation

收集和梳理病理AI大模型相关

Size: 229 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

Hyeongkeun/LAVCap

Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)

Language: Python - Size: 3.58 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

HyeonjeongHa/MM-PoisonRAG

Official PyTorch implementation of "MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks"

Language: Python - Size: 28.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

dvlab-research/LSDBench

A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs.

Language: Python - Size: 2.57 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

Language: Python - Size: 18.6 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

Shengwei-Peng/TOCFL-MultiBench

TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.

Language: Python - Size: 170 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

sitamgithub-MSIT/streamlit-app-builder

A Streamlit-based AI assistant generates custom Streamlit app code from user-provided images or text using the Google Gemini model.

Language: Python - Size: 934 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 3 - Forks: 3

patrick-tssn/MM-NIAVH

Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy

Language: Python - Size: 29.7 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 1

hari-huynh/viVQA-voice-assistant

Voice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper

Language: Python - Size: 5.96 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 3

PRAISELab-PicusLab/MMMED

🩺 MMMED is a benchmark dataset for evaluating Vision-Language Models (VLMs) on medical multiple-choice question answering (MCQA) tasks. 🏥💡 It features 194 real-world medical questions from Spanish MIR residency exams, available in 🇪🇸 Spanish, 🇬🇧 English, and 🇮🇹 Italian.

Language: Jupyter Notebook - Size: 730 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

UKPLab/arxiv2025-misleading-visualizations

Code and datasets accompanying the arXiv preprint: "Protecting multimodal large language models against misleading visualizations"

Language: JavaScript - Size: 22.6 MB - Last synced at: 2 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

zjunlp/Knowledge2Data

Spatial Knowledge Graph-Guided Synthesis for Multimodal LLMs

Language: Python - Size: 1.51 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

fscdc/ReasonMap

ReasonMap

Language: Python - Size: 7.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

xmed-lab/UniEval

UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation

Language: Python - Size: 26 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

pritamqu/RRPO

Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization

Language: Python - Size: 4.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

praevalis/Test-It

Test-it! is an AI testing tool designed to generate comprehensive testing instructions and code for digital products based on snapshots and code snippets.

Language: JavaScript - Size: 1.21 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

puar-playground/Self-Visual-RAG

Implementation of MLLM-based Self-Vision-RAG models

Language: Python - Size: 1.56 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

UKPLab/naacl2025-cove

Code associated with the NAACL 2025 paper "COVE: COntext and VEracity prediction for out-of-context images"

Language: Python - Size: 2.51 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Language: Python - Size: 94.7 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

zjysteven/Awesome-Byte-LLM

A curated list of papers and resources on byte-based large language models (LLMs) — models that operate directly on raw bytes.

Size: 435 KB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

thisisiron/LLaVA-Pool

🌋 A flexible framework for training and configuring Vision-Language Models

Language: Python - Size: 3.11 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Glasgow-AI4BioMed/RRG-BioNLP-ACL2024 Fork of X-iZhang/RRG-BioNLP-ACL2024

[BioNLP ACL'24] Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation

Language: Python - Size: 737 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

luisrui/Modality-Interference-in-MLLMs

The source code for the paper "Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models"

Language: Python - Size: 0 Bytes - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

hacheyz/FlowchartQA

Create flowchart QA datasets using Python and Mermaid, free of AIGC.

Language: Python - Size: 457 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

Glasgow-AI4BioMed/Libra Fork of X-iZhang/Libra

[ACL 2025] Libra: Leveraging Temporal Images for Biomedical Radiology Analysis

Language: Python - Size: 13.5 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

X-iZhang/RRG-BioNLP-ACL2024

[BioNLP ACL'24] 🔬 Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning.

Language: Python - Size: 596 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 1

Related Topics
large-language-models 72 multimodal 44 llm 40 vision-language-model 30 mllm 24 large-multimodal-models 18 deep-learning 18 machine-learning 17 artificial-intelligence 15 vlm 15 multimodal-learning 15 benchmark 14 multimodal-deep-learning 14 llms 13 llava 12 chatbot 12 generative-ai 12 large-vision-language-models 12 natural-language-processing 12 reasoning 10 foundation-models 9 multimodality 9 video 8 instruction-tuning 8 visual-question-answering 8 large-language-model 8 computer-vision 8 ai 8 video-understanding 7 llama 7 transformers 7 retrieval-augmented-generation 7 vision-language 7 multimodal-data 7 hallucination 6 rag 6 awesome-list 6 dataset 6 python 6 vision-language-models 6 vision-transformer 5 reinforcement-learning 5 agentic-ai 5 medical-image-analysis 5 streamlit 5 visual-instruction-tuning 5 chatgpt 5 qwen 5 pytorch 4 segmentation 4 in-context-learning 4 video-language-model 4 mixture-of-experts 4 hallucination-detection 4 instruction-following 4 knowledge-graph 4 multi-modality 4 long-video-understanding 4 huggingface-transformers 4 question-answering 4 llama3 4 hallucination-mitigation 4 huggingface 4 chest-xrays 4 gemini-pro 4 video-question-answering 4 text-to-image-generation 4 radiology-report-generation 4 clip 4 vision-and-language 4 nlp 4 docker 4 fine-tuning 3 reasoning-language-models 3 supervised-finetuning 3 text-to-image 3 audio 3 safety 3 neurips-2024 3 ai-agents 3 code-generation 3 speech-language-model 3 alignment 3 speech 3 chain-of-thought 3 evaluation 3 llms-benchmarking 3 pinecone 3 aigc 3 gpt-4 3 mllm-reasoning 3 python3 3 gradio 3 gemini-api 3 agentic-workflow 3 gpt 3 deepseek-r1 3 r1 3 multi-modal 3 benchmarking 3