Topic: "vlms"
oumi-ai/oumi
Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!
Language: Python - Size: 9.12 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8,112 - Forks: 595

yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
Size: 648 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 686 - Forks: 60

NanoNets/docext
An on-premises, OCR-free unstructured data extraction and benchmarking toolkit.
Language: Python - Size: 2.84 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 374 - Forks: 25

dvlab-research/VisionZip
Official repository for VisionZip (CVPR 2025)
Language: Python - Size: 18.2 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 274 - Forks: 12

tianyi-lab/HallusionBench
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
Language: Python - Size: 11.1 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 270 - Forks: 8

Beckschen/ViTamin
[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"
Language: Python - Size: 56 MB - Last synced at: 21 days ago - Pushed at: 12 months ago - Stars: 204 - Forks: 6

MCG-NJU/AWT
[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation
Language: Python - Size: 12.3 MB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 79 - Forks: 1

mbzuai-oryx/KITAB-Bench
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
Language: Python - Size: 26.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 35 - Forks: 2

Mamadou-Keita/VLM-DETECT
[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
Language: Python - Size: 134 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 21 - Forks: 2

ShenzheZhu/JailDAM
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Size: 3.52 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11 - Forks: 0

ThomasVonWu/Awesome-VLMs-Strawberry
A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
Size: 760 KB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 10 - Forks: 1

TUM-AVS/FM-for-Scenario-Generation-Analysis
This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.
Size: 2.24 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 1

foundation-multimodal-models/CAL
Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
Language: Python - Size: 1.78 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 9 - Forks: 0

aim-uofa/SegAgent
[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Size: 46.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

SrGrace/generative-ai-compass
A comprehensive guide to navigating the world of generative artificial intelligence!
Size: 27.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 0

Raymond-Qiancx/Awesome-Multimodal-Machine-Learning-Papers
Taxonomy and listing of current powerful studies in Advanced Multimodal Machine Learning.
Size: 690 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 1

VectorInstitute/VLDBench
VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.
Language: Python - Size: 259 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

maokangkun/SigmaFlow
SigmaFlow is a Python package designed to optimize the performance of task-flow related to LLMs/MLLMs or Multi-agent.
Language: Python - Size: 7.74 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

Masoudjafaripour/FM_RL_Survey
A repo for survey paper "The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning" and a collection of AWESOME papers focused on using LLMs, VLMs for improving RL.
Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

yasho191/SwiftAnnotate
Auto labelling tool for Text, Image, Video
Language: Python - Size: 2.26 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

hucebot/words2contact
Official implementation of "Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models" (IEEE Humanoids 2024).
Language: Python - Size: 13.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

Imageomics/VLM4Bio
Code for VLM4Bio, a benchmark dataset of scientific question-answer pairs used to evaluate pretrained VLMs for trait discovery from biological images.
Language: Python - Size: 2.51 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 2

KT313/assistant_base
A custom framework for easy use of LLMs, VLMs, etc. supporting various modes and settings via web-ui
Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

angmavrogiannis/Embodied-Attribute-Detection
Code for the ICRA 2025 paper: Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Language: Python - Size: 35.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

JiHoonLee9898/RVCD
[ACL findings 2025] "Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models"
Language: Python - Size: 155 MB - Last synced at: about 23 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

vijaysr4/MMEL
Research Project 1 - Multimodal Entity Linking with VLMs on WikiData
Language: Python - Size: 39.1 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

PGSmall/clip-pgs
Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"
Language: Python - Size: 8.97 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

Someboi1681/BobVLM
BobVLM – A 1.5B multimodal model built from scratch and pre-trained on a single P100 GPU capable of image descriptions and moderate question answering. 🤗🎉
Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

khurramHashmi/LLaVA-v1.6-Mistral-7b-Finetune-ORPO-RLAIF-V Fork of haotian-liu/LLaVA
Align llava-v1.6-mistral-7b on RLAIF-V dataset using ORPO
Language: Python - Size: 19.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

werywjw/MultiClimate
[EMNLP 2024 Workshop NLP4PI]🌏 MultiClimate: Multimodal Stance Detection on Climate Change Videos 🌎
Language: Jupyter Notebook - Size: 1.7 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0
