Topic: "multimodal-large-language-models"
Lzcstan/DrugLAMP
A PyTorch-based system for highly accurate drug-target interaction predictions utilizing multi-modal large language models to discern structural affinities in drug-target pairs.
Language: Python - Size: 128 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 0

xyz9911/FLAME
[AAAI-25 Oral] Official Implementation of "FLAME: Learning to Navigate with Multimodal LLM in Urban Environments"
Language: Python - Size: 8.57 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 30 - Forks: 3

bigai-nlco/VideoTGB
[EMNLP 2024] A Video Chat Agent with Temporal Prior
Language: Python - Size: 51.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 29 - Forks: 2

Czi24/Awesome-MLLM-LLM-Colab
Happy experimenting with MLLM and LLM models!
Language: Jupyter Notebook - Size: 17.9 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 29 - Forks: 2

MileBench/MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Language: Python - Size: 3.52 MB - Last synced at: 4 months ago - Pushed at: 11 months ago - Stars: 29 - Forks: 1

AlignGPT-VL/AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
Language: Python - Size: 1.97 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 28 - Forks: 3

patrick-tssn/VideoHallucer
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
Language: Python - Size: 21.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 0

declare-lab/MM-InstructEval
This repository contains code to evaluate various multimodal large language models using different instructions across multiple multimodal content comprehension tasks.
Language: Python - Size: 32.6 MB - Last synced at: 2 months ago - Pushed at: 3 months ago - Stars: 27 - Forks: 1

inst-it/inst-it
Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"
Language: Python - Size: 2.66 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 27 - Forks: 0

zjunlp/InstructCell
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following
Language: Jupyter Notebook - Size: 10.7 MB - Last synced at: 6 days ago - Pushed at: 5 months ago - Stars: 27 - Forks: 5

SuperBruceJia/Awesome-Large-Vision-Language-Model
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
Size: 103 KB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 27 - Forks: 3

Wild-Cooperation-Hub/Awesome-MLLM-Reasoning-Benchmarks
A Comprehensive Survey on Evaluating Reasoning Capabilities in Multimodal Large Language Models.
Size: 89.8 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 26 - Forks: 2

Video-Bench/Video-Bench
Video Generation Benchmark
Language: Python - Size: 10.1 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 25 - Forks: 3

The-Martyr/CausalMM
[ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Language: Python - Size: 7.1 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 25 - Forks: 2

xmed-lab/MedRegA
[ICLR 2025] MedRegA: Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Language: Python - Size: 5.61 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 25 - Forks: 1

mlvlab/LLaMo
Official Implementation (Pytorch) of the "LLaMo: Large Language Model-based Molecular Graph Assistant", NeurIPS 2024
Language: Python - Size: 44.9 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 25 - Forks: 1

philfung/computer-use
try Computer Use on your Mac with a few clicks
Language: Python - Size: 105 KB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 24 - Forks: 2

AstraZeneca/vlm
Official implementation for "Diffusion Instruction Tuning"
Language: Python - Size: 25.5 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 23 - Forks: 2

moucheng2017/SOP-LVM-ICL-Ensemble
[NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding
Language: Python - Size: 834 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 22 - Forks: 3

VachanVY/Transfusion.torch
PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Language: Python - Size: 2.07 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 21 - Forks: 5

multimodal-ai-lab/DEFAME
Fact-checking system for textual and visual inputs.
Language: Python - Size: 30.1 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 20 - Forks: 4

SlytherinGe/RSTeller
Vision-Language Dataset for Remote Sensing
Language: Python - Size: 15.5 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 20 - Forks: 3

willxxy/ECG-Byte
[arxiv 2024] ECG-Byte: A Tokenizer for End-to-End Generative Electrocardiogram Language Modeling
Language: Python - Size: 28.5 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 18 - Forks: 0

vbdi/divprune
[CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Language: Python - Size: 11 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 18 - Forks: 0

eric-ai-lab/MMWorld
Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
Language: Python - Size: 1.47 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 18 - Forks: 1

AIDC-AI/Wings
The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]
Language: Python - Size: 2.85 MB - Last synced at: 3 months ago - Pushed at: 6 months ago - Stars: 17 - Forks: 1

ryota-komatsu/slp2025
音学シンポジウム2025チュートリアル「マルチモーダル大規模言語モデル入門」資料
Language: Jupyter Notebook - Size: 19.6 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 16 - Forks: 2

willxxy/ECG-Bench
A Unified Framework for Benchmarking Generative Electrocardiogram-Language Models (ELMs)
Language: Python - Size: 6.81 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 15 - Forks: 2

multimindlab/multimind-sdk
Your SDK solves all of this. One interface. Unified logic. Local + hosted models. Fine-tuning. Agent tools. Enterprise-ready. Hybrid RAG.Star 🌟 if you like it!
Language: Python - Size: 46.4 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 15 - Forks: 1

Wang-ML-Lab/interpretable-foundation-models
[ICML 2024] Probabilistic Conceptual Explainers (PACE): Trustworthy Conceptual Explanations for Vision Foundation Models
Language: Python - Size: 52.7 KB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 15 - Forks: 3

gautierdag/plancraft
Plancraft is a minecraft environment and agent suite to test planning capabilities in LLMs
Language: Python - Size: 124 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 15 - Forks: 0

The-Martyr/Awesome-Modality-Priors-in-MLLMs
Latest Advances on Modality Priors in Multimodal Large Language Models
Size: 76.2 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 15 - Forks: 1

X-iZhang/Libra
[ACL 2025] ⚖️ Temporally-aware MLLM for Biomedical Radiology Analysis and Report Generation. Flexible toolkit with LLM backbone support, real-time validation, training resumption, and smart model saving.
Language: Python - Size: 13.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 14 - Forks: 1

mlvlab/VidChain
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning", AAAI 2025
Language: Python - Size: 9.02 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 13 - Forks: 0

PanguIR/MRAGSurvey
A Survey of Multimodal Retrieval-Augmented Generation
Size: 4.92 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 11 - Forks: 1

rohit901/VANE-Bench
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
Language: Python - Size: 38.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 1

somvy/multimodal_unlearning
Experiments for our CLEAR benchmark of unlearning methods in a multimodal setup
Language: Python - Size: 112 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 10 - Forks: 0

Manchery/awesome-visual-tokenizer
[WIP🚧] 2025 up-to-date list of resources on visual tokenizers (primarily for visual generation). Give it a star 🌟 if you find it useful.
Size: 6.84 KB - Last synced at: about 1 month ago - Pushed at: 6 months ago - Stars: 10 - Forks: 0

hpc203/Chinese-CLIP-opencv-onnxrun
使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序
Language: C++ - Size: 4.03 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 9 - Forks: 1

declare-lab/Auto-Scaling
[Arxiv 2024] Official Implementation of the paper: "Towards Robust Instruction Tuning on Multimodal Large Language Models"
Language: Jupyter Notebook - Size: 67.6 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 9 - Forks: 1

eric-ai-lab/MSSBench
[ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"
Language: Python - Size: 1.51 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

hrlics/LITE
[COLM 2024] LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models
Language: Python - Size: 1.31 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 9 - Forks: 0

OmniMMI/OmniMMI
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Language: Python - Size: 25.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 8 - Forks: 0

mashijie1028/GenHancer
A post-training method to enhance CLIP's fine-grained visual representations with generative models.
Language: Python - Size: 2.56 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

Z1zs/MMNeuron
Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our codes are borrowed from Tang's language specific neurons implementation and nrimsky's logit lens implementation.
Language: Python - Size: 28.3 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 8 - Forks: 0

BillChan226/MJ-Bench
Official implementation for "MJ-BENCH: Is Your Multimodal Reward Model Really a Good Judge?"
Language: Jupyter Notebook - Size: 2.56 GB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 8 - Forks: 0

Hoar012/TDC-Video
Language: Python - Size: 5.75 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 7 - Forks: 0

CristianoPatricio/CBVLM
Code for the paper "CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification".
Language: Python - Size: 903 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 7 - Forks: 1

inFaaa/Evolver
[COLING 2025🔥] Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection
Language: Python - Size: 202 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 7 - Forks: 0

NKU-MetautoAI/awesome-large-vision-language-models
Advances in recent large vision language models (LVLMs)
Size: 32.7 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 7 - Forks: 0

msamprovalaki/Exploring-Multimodal-Large-Language-Models-for-Medical-Image-Captioning
This repository includes the code for my Master Thesis, which investigates the application of Multimodal Large Language Models (MLLMs) for medical image captioning
Language: Python - Size: 5.45 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 6 - Forks: 0

gaotiexinqu/V2P-Bench
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
Language: Python - Size: 26.8 MB - Last synced at: 27 days ago - Pushed at: 27 days ago - Stars: 6 - Forks: 0

jolibrain/colette
Search and interact locally with technical documents of any kind
Language: HTML - Size: 7.74 MB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 6 - Forks: 4

AceCHQ/MMIQ
This repo contains evaluation code for MM-IQ benchmark.
Language: Jupyter Notebook - Size: 1.63 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 6 - Forks: 0

BUAADreamer/Qwen2-VL-History
Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums
Size: 73.8 MB - Last synced at: 3 months ago - Pushed at: 9 months ago - Stars: 6 - Forks: 2

abdur75648/V-Zen
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM Resources
Size: 5.86 KB - Last synced at: 7 days ago - Pushed at: 11 months ago - Stars: 6 - Forks: 3

zjr2000/REVERIE
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Language: Python - Size: 1.12 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

GerrySant/multimodalhugs
MultimodalHugs is an extension of Hugging Face that offers a generalized framework for training, evaluating, and using multimodal AI models with minimal code differences, ensuring seamless compatibility with Hugging Face pipelines.
Language: Python - Size: 4.38 MB - Last synced at: about 8 hours ago - Pushed at: about 9 hours ago - Stars: 5 - Forks: 2

RainBowLuoCS/Awesome-Unified-Multimodal-Understanding-and-Generation
📰 Must-read papers on Unified Multimodal Understanding and Generation (constantly updating 🤗).
Size: 11.7 KB - Last synced at: 2 days ago - Pushed at: 7 days ago - Stars: 5 - Forks: 0

aliencaocao/vlm-for-memes-aisg
Language: Jupyter Notebook - Size: 15.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

araobp/virtual-showroom
A virtual showroom with virtual promotional models (Chatbots) and a 240-degree screen for VR experiences with naked eyes.
Language: C# - Size: 1.33 GB - Last synced at: 2 months ago - Pushed at: 7 months ago - Stars: 5 - Forks: 0

emanalytic/MultiModal-E-Commerce-Customer-Support-Chatbot
Multimodal Customer Service Chatbot
Language: Python - Size: 154 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 4 - Forks: 1

Hazel-Heejeong-Nam/VAGUE
An official repository of "VAGUE: Visual Contexts Clarify Ambiguous Expressions"
Language: Python - Size: 3.07 MB - Last synced at: 27 days ago - Pushed at: 28 days ago - Stars: 4 - Forks: 0

AMD-AIG-AIMA/gpt-fast
The GPT-Fast for Multimodal Models on AMD GPUs
Language: Python - Size: 6.03 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 4 - Forks: 0

NotYuSheng/Multimodal-Large-Language-Model
Localized Multimodal Large Language Model (MLLM) integrated with Streamlit and Ollama for text and image processing tasks.
Language: Python - Size: 7.37 MB - Last synced at: about 2 months ago - Pushed at: 2 months ago - Stars: 4 - Forks: 2

defnecirci/MatSciTableExtract
Extracting structured materials science data from tables using LLMs
Language: Python - Size: 147 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

Gen-Verse/HermesFlow
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
Language: Python - Size: 1.91 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 0

alexander-moore/vlm
Composition of Multimodal Language Models From Scratch
Language: Jupyter Notebook - Size: 1.32 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 4 - Forks: 0

Vignesh010101/Intelligent-Health-LLM-System
An Intelligent Health LLM System for Personalized Medication Guidance and Support.
Language: Jupyter Notebook - Size: 620 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 3 - Forks: 0

LAMDA-Tabular/MMTU
The comprehensive MMTU Benchmark for testing the MLLMs' capabilkity in Tabular Understanding
Language: Python - Size: 1.34 MB - Last synced at: 15 days ago - Pushed at: 16 days ago - Stars: 3 - Forks: 0

xavier-yu114/Zoom-Refine
Zoom-Refine: Boosting High-Resolution Multimodal Understanding via Localized Zoom and Self-Refinement
Language: Python - Size: 3.64 MB - Last synced at: 17 days ago - Pushed at: 18 days ago - Stars: 3 - Forks: 0

nkkbr/ViCA
This is the official implementation of ViCA2 (Visuospatial Cognitive Assistant 2), a multimodal large language model designed for advanced visuospatial reasoning. The repository also provides training scripts for the original ViCA model.
Language: Python - Size: 4.3 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 3 - Forks: 0

yu-rp/Dimple
Dimple, a Discrete Diffusion Multimodal Large Language Model
Size: 1.95 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

315386775/OpenPathoFoundation
收集和梳理病理AI大模型相关
Size: 229 KB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

Hyeongkeun/LAVCap
Official Pytorch Implementation of 'LAVCap: LLM-based Audio-Visual Captioning using Optimal Transport' (ICASSP2025)
Language: Python - Size: 3.58 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 3 - Forks: 0

HyeonjeongHa/MM-PoisonRAG
Official PyTorch implementation of "MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks"
Language: Python - Size: 28.5 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

dvlab-research/LSDBench
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency of long-video VLMs.
Language: Python - Size: 2.57 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
Language: Python - Size: 18.6 KB - Last synced at: about 2 months ago - Pushed at: 5 months ago - Stars: 3 - Forks: 0

Shengwei-Peng/TOCFL-MultiBench
TOCFL-MultiBench: A multimodal benchmark for evaluating Chinese language proficiency using text, audio, and visual data with deep learning. Features Selective Token Constraint Mechanism (STCM) for enhanced decoding stability.
Language: Python - Size: 170 KB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 3 - Forks: 0

sitamgithub-MSIT/streamlit-app-builder
A Streamlit-based AI assistant generates custom Streamlit app code from user-provided images or text using the Google Gemini model.
Language: Python - Size: 934 KB - Last synced at: about 1 month ago - Pushed at: 8 months ago - Stars: 3 - Forks: 3

patrick-tssn/MM-NIAVH
Pressure Testing Large Video-Language Models (LVLM): Doing multimodal retrieval from LVLM at any video lengths to measure accuracy
Language: Python - Size: 29.7 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 1

hari-huynh/viVQA-voice-assistant
Voice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper
Language: Python - Size: 5.96 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 3

PRAISELab-PicusLab/MMMED
🩺 MMMED is a benchmark dataset for evaluating Vision-Language Models (VLMs) on medical multiple-choice question answering (MCQA) tasks. 🏥💡 It features 194 real-world medical questions from Spanish MIR residency exams, available in 🇪🇸 Spanish, 🇬🇧 English, and 🇮🇹 Italian.
Language: Jupyter Notebook - Size: 730 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2 - Forks: 0

UKPLab/arxiv2025-misleading-visualizations
Code and datasets accompanying the arXiv preprint: "Protecting multimodal large language models against misleading visualizations"
Language: JavaScript - Size: 22.6 MB - Last synced at: 2 days ago - Pushed at: 6 days ago - Stars: 2 - Forks: 0

zjunlp/Knowledge2Data
Spatial Knowledge Graph-Guided Synthesis for Multimodal LLMs
Language: Python - Size: 1.51 MB - Last synced at: 6 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

fscdc/ReasonMap
ReasonMap
Language: Python - Size: 7.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

xmed-lab/UniEval
UniEval: Unified Holistic Evaluation for Unified Multimodal Understanding and Generation
Language: Python - Size: 26 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 1

pritamqu/RRPO
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Language: Python - Size: 4.35 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

praevalis/Test-It
Test-it! is an AI testing tool designed to generate comprehensive testing instructions and code for digital products based on snapshots and code snippets.
Language: JavaScript - Size: 1.21 MB - Last synced at: 9 days ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

puar-playground/Self-Visual-RAG
Implementation of MLLM-based Self-Vision-RAG models
Language: Python - Size: 1.56 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 2 - Forks: 1

UKPLab/naacl2025-cove
Code associated with the NAACL 2025 paper "COVE: COntext and VEracity prediction for out-of-context images"
Language: Python - Size: 2.51 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Language: Python - Size: 94.7 KB - Last synced at: 2 months ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

zjysteven/Awesome-Byte-LLM
A curated list of papers and resources on byte-based large language models (LLMs) — models that operate directly on raw bytes.
Size: 435 KB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

thisisiron/LLaVA-Pool
🌋 A flexible framework for training and configuring Vision-Language Models
Language: Python - Size: 3.11 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

Glasgow-AI4BioMed/RRG-BioNLP-ACL2024 Fork of X-iZhang/RRG-BioNLP-ACL2024
[BioNLP ACL'24] Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation
Language: Python - Size: 737 KB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

luisrui/Modality-Interference-in-MLLMs
The source code for the paper "Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models"
Language: Python - Size: 0 Bytes - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

hacheyz/FlowchartQA
Create flowchart QA datasets using Python and Mermaid, free of AIGC.
Language: Python - Size: 457 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 1 - Forks: 0

Glasgow-AI4BioMed/Libra Fork of X-iZhang/Libra
[ACL 2025] Libra: Leveraging Temporal Images for Biomedical Radiology Analysis
Language: Python - Size: 13.5 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 0

X-iZhang/RRG-BioNLP-ACL2024
[BioNLP ACL'24] 🔬 Med-CXRGen, developed by Glasgow AI4BioMed Lab, brings vision-language adaptation to biomedical radiology via visual instruction tuning.
Language: Python - Size: 596 KB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 1 - Forks: 1
