Topic: "visual-language-models"
zai-org/CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
Language: Python - Size: 25.8 MB - Last synced at: 1 day ago - Pushed at: about 1 year ago - Stars: 6,626 - Forks: 434

camel-ai/crab
🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
Language: Python - Size: 8.77 MB - Last synced at: 4 days ago - Pushed at: 25 days ago - Stars: 357 - Forks: 51

MiniMax-AI/One-RL-to-See-Them-All
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Language: Python - Size: 13 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 230 - Forks: 5

bilel-bj/ROSGPT_Vision
Commanding robots using only Language Models' prompts
Language: Python - Size: 21.4 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 98 - Forks: 13

hk-zh/language-conditioned-robot-manipulation-models
https://arxiv.org/abs/2312.10807
Size: 1.28 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 51 - Forks: 1

kesimeg/awesome-turkish-language-models
A curated list of Turkish AI models, datasets, papers
Size: 7.81 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 36 - Forks: 0

jaisidhsingh/CoN-CLIP
Implementation of the "Learn No to Say Yes Better" paper.
Language: Python - Size: 4.36 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 31 - Forks: 2

tianyu-z/VCR
Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.
Language: Python - Size: 6.58 MB - Last synced at: 4 months ago - Pushed at: 5 months ago - Stars: 31 - Forks: 2

AlignGPT-VL/AlignGPT
Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"
Language: Python - Size: 1.97 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 28 - Forks: 3

csebuetnlp/IllusionVQA
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
Language: Jupyter Notebook - Size: 87.7 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 13 - Forks: 2

BioMedIA-MBZUAI/FetalCLIP
Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis
Language: Python - Size: 12.6 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 0

sduzpf/UAP_VLP
Universal Adversarial Perturbations for Vision-Language Pre-trained Models
Language: Python - Size: 14.8 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 13 - Forks: 0

Sid2697/HOI-Ref
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
Language: Python - Size: 6.73 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 2

amathislab/wildclip
Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models
Language: Python - Size: 5.27 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 0

declare-lab/Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
Language: Python - Size: 8.92 MB - Last synced at: 4 months ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 3

GraphPKU/CoI
Chain of Images for Intuitively Reasoning
Language: Python - Size: 5.17 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

shreydan/VLM-OD
experimental: finetune smolVLM on COCO (without any special <locXYZ> tokens)
Language: Jupyter Notebook - Size: 9.84 MB - Last synced at: about 1 month ago - Pushed at: 2 months ago - Stars: 7 - Forks: 1

kornia/kornia-paligemma
Rust implementation of Google Paligemma with Candle
Language: Rust - Size: 2.77 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 6 - Forks: 1

ArthurBabkin/Parimate
A Telegram bot for validating audio and video content using CV models, SR models, and VLMs, with deepfake detection leveraging metadata analysis.
Language: Python - Size: 115 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 6 - Forks: 1

AikyamLab/hallucinogen
A benchmark for evaluating hallucinations in large visual language models
Language: Python - Size: 1.32 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 6 - Forks: 0

nkkbr/ViCA
This is the official implementation of ViCA2 (Visuospatial Cognitive Assistant 2), a multimodal large language model designed for advanced visuospatial reasoning. The repository also provides training scripts for the original ViCA model.
Language: Python - Size: 4.31 MB - Last synced at: 4 days ago - Pushed at: 5 days ago - Stars: 3 - Forks: 0

yangjie-cv/WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
Language: Python - Size: 1.58 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 3 - Forks: 0

xinyanghuang7/Basic-Visual-Language-Model
Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖
Language: Python - Size: 34.7 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

CristianoPatricio/concept-based-interpretability-VLM
Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).
Language: Jupyter Notebook - Size: 21.2 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 3 - Forks: 0

kornia/kornia-infernum
👺 Rust Inference engine for Visual Language Models
Language: Rust - Size: 43.9 KB - Last synced at: 2 days ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

ARResearch-1/DiverseAR-Dataset
Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble
Size: 4.59 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 2 - Forks: 0

rng190001/CS6375-ResearchProject
Visual Language Model focusing on testing different parsing techniques from generated responses
Language: Jupyter Notebook - Size: 880 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

fullscreen-triangle/pakati
A specialized tool that provides granular control over AI image generation by enabling region-based prompting, editing, and transformation with metacognitive orchestration.
Language: Python - Size: 3.37 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

legalaspro/modern_ai_foundations
A collection of implementations exploring modern AI architectures and foundational models.
Language: Jupyter Notebook - Size: 40.4 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

Rooshikesh/Item-Inspector-AI
AI-based product condition detection using BLIP-2 + FastAPI + Phi-4 (Ollama)
Language: Python - Size: 1.38 MB - Last synced at: about 1 month ago - Pushed at: about 2 months ago - Stars: 0 - Forks: 0

rs-anderson/explicit-alignment-for-vqa-tasks
Explicit alignment for few-shot Visual Question Answering.
Language: Python - Size: 20 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

DavidLMS/DescribePDF
A tool to convert PDF files to detailed Markdown descriptions using VLMs
Language: Python - Size: 1.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

HemantM29/Multimodal-Document-Analysis-and-Query-Retrieval
This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.
Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

laclouis5/uform-coreml-converters
CLI for converting UForm models to CoreML.
Language: Python - Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
