An open API service providing repository metadata for many open source software ecosystems.

Topic: "vlms"

oumi-ai/oumi

Easily fine-tune, evaluate and deploy Qwen3, DeepSeek-R1, Llama 4 or any open source LLM / VLM!

Language: Python - Size: 9.12 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 8,112 - Forks: 595

yueliu1999/Awesome-Jailbreak-on-LLMs

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

Size: 648 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 686 - Forks: 60

NanoNets/docext

An on-premises, OCR-free unstructured data extraction and benchmarking toolkit.

Language: Python - Size: 2.84 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 374 - Forks: 25

dvlab-research/VisionZip

Official repository for VisionZip (CVPR 2025)

Language: Python - Size: 18.2 MB - Last synced at: 26 days ago - Pushed at: 3 months ago - Stars: 274 - Forks: 12

tianyi-lab/HallusionBench

[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Language: Python - Size: 11.1 MB - Last synced at: 4 months ago - Pushed at: 7 months ago - Stars: 270 - Forks: 8

Beckschen/ViTamin

[CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"

Language: Python - Size: 56 MB - Last synced at: 21 days ago - Pushed at: 12 months ago - Stars: 204 - Forks: 6

MCG-NJU/AWT

[NeurIPS 2024] AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

Language: Python - Size: 12.3 MB - Last synced at: 6 months ago - Pushed at: 8 months ago - Stars: 79 - Forks: 1

mbzuai-oryx/KITAB-Bench

[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding

Language: Python - Size: 26.3 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 35 - Forks: 2

Mamadou-Keita/VLM-DETECT

[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Language: Python - Size: 134 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 21 - Forks: 2

ShenzheZhu/JailDAM

JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model

Size: 3.52 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 11 - Forks: 0

ThomasVonWu/Awesome-VLMs-Strawberry

A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

Size: 760 KB - Last synced at: 14 days ago - Pushed at: 6 months ago - Stars: 10 - Forks: 1

TUM-AVS/FM-for-Scenario-Generation-Analysis

This repository collects research papers of large Foundation Models for Scenario Generation and Analysis in Autonomous Driving. The repository will be continuously updated to track the latest update.

Size: 2.24 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 9 - Forks: 1

foundation-multimodal-models/CAL

Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment

Language: Python - Size: 1.78 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 9 - Forks: 0

aim-uofa/SegAgent

[CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories

Size: 46.1 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

SrGrace/generative-ai-compass

A comprehensive guide to navigating the world of generative artificial intelligence!

Size: 27.9 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 4 - Forks: 0

Raymond-Qiancx/Awesome-Multimodal-Machine-Learning-Papers

Taxonomy and listing of current powerful studies in Advanced Multimodal Machine Learning.

Size: 690 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 4 - Forks: 1

VectorInstitute/VLDBench

VLDBench: A large-scale benchmark for evaluating Vision-Language Models (VLMs) and Large Language Models (LLMs) on multimodal disinformation detection.

Language: Python - Size: 259 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 3 - Forks: 0

maokangkun/SigmaFlow

SigmaFlow is a Python package designed to optimize the performance of task-flow related to LLMs/MLLMs or Multi-agent.

Language: Python - Size: 7.74 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

Masoudjafaripour/FM_RL_Survey

A repo for survey paper "The Evolving Landscape of LLM- and VLM-Integrated Reinforcement Learning" and a collection of AWESOME papers focused on using LLMs, VLMs for improving RL.

Size: 0 Bytes - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 2 - Forks: 0

yasho191/SwiftAnnotate

Auto labelling tool for Text, Image, Video

Language: Python - Size: 2.26 MB - Last synced at: 7 days ago - Pushed at: 2 months ago - Stars: 2 - Forks: 0

hucebot/words2contact

Official implementation of "Words2Contact: Identifying Support Contacts from Verbal Instructions Using Foundation Models" (IEEE Humanoids 2024).

Language: Python - Size: 13.5 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

Imageomics/VLM4Bio

Code for VLM4Bio, a benchmark dataset of scientific question-answer pairs used to evaluate pretrained VLMs for trait discovery from biological images.

Language: Python - Size: 2.51 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 2 - Forks: 2

KT313/assistant_base

A custom framework for easy use of LLMs, VLMs, etc. supporting various modes and settings via web-ui

Language: Jupyter Notebook - Size: 1.18 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 1 - Forks: 0

angmavrogiannis/Embodied-Attribute-Detection

Code for the ICRA 2025 paper: Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs

Language: Python - Size: 35.9 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

JiHoonLee9898/RVCD

[ACL findings 2025] "Retrieval Visual Contrastive Decoding to Mitigate Object Hallucinations in Large Vision-Language Models"

Language: Python - Size: 155 MB - Last synced at: about 23 hours ago - Pushed at: about 23 hours ago - Stars: 0 - Forks: 0

vijaysr4/MMEL

Research Project 1 - Multimodal Entity Linking with VLMs on WikiData

Language: Python - Size: 39.1 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 0 - Forks: 0

PGSmall/clip-pgs

Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"

Language: Python - Size: 8.97 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 0 - Forks: 1

Someboi1681/BobVLM

BobVLM – A 1.5B multimodal model built from scratch and pre-trained on a single P100 GPU capable of image descriptions and moderate question answering. 🤗🎉

Size: 1000 Bytes - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

khurramHashmi/LLaVA-v1.6-Mistral-7b-Finetune-ORPO-RLAIF-V Fork of haotian-liu/LLaVA

Align llava-v1.6-mistral-7b on RLAIF-V dataset using ORPO

Language: Python - Size: 19.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

werywjw/MultiClimate

[EMNLP 2024 Workshop NLP4PI]🌏 MultiClimate: Multimodal Stance Detection on Climate Change Videos 🌎

Language: Jupyter Notebook - Size: 1.7 GB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0