GitHub topics: vlm
SuyogKamble/simpleVLM
building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
Language: Jupyter Notebook - Size: 7.33 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 0 - Forks: 0

heshengtao/comfyui_LLM_party
LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3 Janus-Pro, Linkage graphRAG
Language: Python - Size: 135 MB - Last synced at: about 8 hours ago - Pushed at: about 8 hours ago - Stars: 1,676 - Forks: 141

xlang-ai/OSWorld
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Language: Python - Size: 46.7 MB - Last synced at: about 12 hours ago - Pushed at: about 13 hours ago - Stars: 1,836 - Forks: 228

Aident-AI/open-cuak
Reliable Automation Agents at Scale
Language: TypeScript - Size: 5.75 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 344 - Forks: 26

VLMHyperBenchTeam/VLMHyperBench
VLMHyperBench – open source фреймворк для оценки возможностей Vision language models (VLM) распознавать документы на русском языке с целью оценки их потенциала для автоматизации документооборота.
Language: Python - Size: 2.82 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 54 - Forks: 0

ShimaN19/Facial__Attributes
Fine‑tuned CLIP‑based VLM for few‑shot facial attribute classification on WFLW—clone, pip install -e ., and hit the main script to train or infer in seconds.
Language: Jupyter Notebook - Size: 731 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 0 - Forks: 0

VectorInstitute/vector-inference
Efficient LLM inference on Slurm clusters using vLLM.
Language: Python - Size: 2.79 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 59 - Forks: 10

awwaiid/ghostwriter
Use the reMarkable2 as an interface to vision-LLMs (ChatGPT, Claude, Gemini). Ghost in the machine!
Language: Rust - Size: 5.98 MB - Last synced at: 2 days ago - Pushed at: 14 days ago - Stars: 454 - Forks: 17

bytedance/UI-TARS-desktop
A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.
Language: TypeScript - Size: 43.8 MB - Last synced at: 2 days ago - Pushed at: 3 days ago - Stars: 13,258 - Forks: 1,072

intelligolabs/CoIN
Official repository of "Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input"
Language: Jupyter Notebook - Size: 11.4 MB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 4 - Forks: 0

kesimeg/awesome-turkish-language-models
A curated list of Turkish AI models, datasets, papers
Size: 4.88 KB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 27 - Forks: 0

vlm-run/vlmrun-python-sdk
Official Python SDK for VLM Run
Language: Python - Size: 1.2 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 8 - Forks: 0

MiniMax-AI/MiniMax-01
The official repo of MiniMax-Text-01 and MiniMax-VL-01, large-language-model & vision-language-model based on Linear Attention
Language: Python - Size: 9.18 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 2,586 - Forks: 194

JoeJoe1313/PaliGemma-Image-Segmentation
An app with FastAPI, Docker, transformers, JAX/Flax for performing image segmentation with PaliGemma 2 mix
Language: Python - Size: 7.46 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 0 - Forks: 0

NexaAI/nexa-sdk
Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
Language: Python - Size: 195 MB - Last synced at: 2 days ago - Pushed at: 2 months ago - Stars: 4,531 - Forks: 628

mgonzs13/llama_ros
llama.cpp (GGUF LLMs) and llava.cpp (GGUF VLMs) for ROS 2
Language: C++ - Size: 7.02 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 199 - Forks: 32

roboflow/notebooks
This repository offers a comprehensive collection of tutorials on state-of-the-art computer vision models and techniques. Explore everything from foundational architectures like ResNet to cutting-edge models like YOLO11, RT-DETR, SAM 2, Florence-2, PaliGemma 2, and Qwen2.5VL.
Language: Jupyter Notebook - Size: 463 MB - Last synced at: 3 days ago - Pushed at: 17 days ago - Stars: 7,651 - Forks: 1,198

balrog-ai/BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
Language: Python - Size: 1.51 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 138 - Forks: 26

shreydan/simpleVLM
building a simple VLM. Implementing LlaMA-SmolLM2 from scratch + SigLip2 Vision Model. KV-Caching is supported and implemented from scratch as well
Language: Jupyter Notebook - Size: 7.33 MB - Last synced at: 3 days ago - Pushed at: 4 days ago - Stars: 3 - Forks: 0

AXERA-TECH/ax-llm
Explore LLM model deployment based on AXera's AI chips
Language: C++ - Size: 10.4 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 102 - Forks: 17

mtkresearch/BreezeApp
BreezeAPP 是一款為 Android 和 iOS 平台開發的純手機 AI 應用程式。從 App Store下載,即可在不連網的狀態下享受多項 AI 功能。源碼由聯發創新基地(MediaTek Research)提供。我們旨在推廣兩個概念: 人人都可以在自己的手機上自由選擇並運行不同的LLM - one is free to choose one's own LLM to run on a phone,以及任何app開發者都可以輕鬆寫作創意的純手機AI應用 - any dev can create purely phone-based AI apps easily。
Language: Java - Size: 221 MB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 97 - Forks: 6

LoupXpro/AlphaExtract
AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.
Language: Python - Size: 12.6 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

mbzuai-oryx/GeoChat
[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing
Language: Python - Size: 62.3 MB - Last synced at: 3 days ago - Pushed at: 5 months ago - Stars: 562 - Forks: 49

jeremyarancio/VLM-Batch-Deployment
Batch Deployment for Document Parsing with AWS Batch & Qwen-2.5-VL
Language: Jupyter Notebook - Size: 398 KB - Last synced at: 4 days ago - Pushed at: 13 days ago - Stars: 36 - Forks: 13

BAAI-Agents/Cradle
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.
Language: Python - Size: 433 MB - Last synced at: 3 days ago - Pushed at: 6 months ago - Stars: 2,088 - Forks: 185

kornia/bubbaloop
🦄 Serving Platform for Spatial AI and Robotics.
Language: Rust - Size: 1.17 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 12 - Forks: 4

KRproject-tech/FSI_by_FEM_and_UVLM
Fluid-Structure Interaction Analysis Using FEM and UVLM
Language: MATLAB - Size: 29.3 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 24 - Forks: 2

BAAI-Agents/GPA-LM
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges".
Size: 3.81 MB - Last synced at: 4 days ago - Pushed at: 8 months ago - Stars: 143 - Forks: 7

aimagelab/ReflectiVA
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Language: Python - Size: 7.17 MB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 28 - Forks: 0

modelscope/evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
Language: Python - Size: 58 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 892 - Forks: 100

ThuCCSLab/Awesome-LM-SSP
A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).
Size: 2.46 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1,386 - Forks: 88

jonatanelmaspro2023/ailert-nextjs
This repository contains the frontend code for Ailert.tech build on Next.js, Tailwind CSS, and Python.
Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language: Python - Size: 17.9 MB - Last synced at: 5 days ago - Pushed at: 6 days ago - Stars: 13,942 - Forks: 1,659

coderonion/awesome-cuda-and-hpc
🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.
Size: 55.7 KB - Last synced at: 4 days ago - Pushed at: 14 days ago - Stars: 258 - Forks: 30

SkyworkAI/Skywork-R1V
Skywork-R1V2 : Multimodal Hybrid Reinforcement Learning for Reasoning(最好的多模态推理)
Language: Python - Size: 36.3 MB - Last synced at: 6 days ago - Pushed at: 13 days ago - Stars: 2,405 - Forks: 236

NVIDIA-Omniverse-blueprints/3d-conditioning
Enhance and modify high-quality compositions using real-time rendering and generative AI output without affecting a hero product asset.
Language: Python - Size: 63.1 MB - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 81 - Forks: 14

declare-lab/Emma-X
Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning
Language: Python - Size: 32.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 60 - Forks: 4

mbodiai/embodied-agents
Seamlessly integrate state-of-the-art transformer models into robotics stacks
Language: Python - Size: 75.2 MB - Last synced at: about 3 hours ago - Pushed at: 17 days ago - Stars: 207 - Forks: 22

TIGER-AI-Lab/VisualWebInstruct
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"
Language: Python - Size: 7.82 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 24 - Forks: 1

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics
Awesome LLM Papers and repos on very comprehensive topics.
Size: 450 KB - Last synced at: 6 days ago - Pushed at: 9 months ago - Stars: 217 - Forks: 22

vlm-run/vlmrun-cookbook
Examples and guides for using the VLM Run API
Language: Jupyter Notebook - Size: 50 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 275 - Forks: 12

gokayfem/awesome-vlm-architectures
Famous Vision Language Models and Their Architectures
Language: Markdown - Size: 2.26 MB - Last synced at: 7 days ago - Pushed at: 3 months ago - Stars: 804 - Forks: 42

RobotecAI/rai
RAI is an agentic framework for robotics, utilizing Langchain and ROS 2 tools to perform complex actions, defined scenarios, free interface execution, log summaries, voice interaction and more.
Language: Python - Size: 51.9 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 307 - Forks: 39

apple/ml-space-benchmark
Code and data for "Does Spatial Cognition Emerge in Frontier Models?"
Language: Python - Size: 931 KB - Last synced at: 4 days ago - Pushed at: 23 days ago - Stars: 13 - Forks: 0

thisisiron/LLaVA-Pool
🌋 A flexible framework for training and configuring Vision-Language Models
Language: Python - Size: 3.09 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 1 - Forks: 0

Yugsolanki/DocuSense
DocuSense is a powerful PDF parsing tool using LLMs and VLMs to extract, process, and summarize content from text and image-based PDFs.
Language: Python - Size: 793 KB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 0 - Forks: 0

FennelFetish/qapyq
An image viewer and AI-assisted editing/captioning/masking tool that helps with curating datasets for generative AI models, finetunes and LoRA.
Language: Python - Size: 1.44 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 121 - Forks: 5

GodotMisogi/AeroFuse.jl
A toolbox meant for aircraft design analyses.
Language: Julia - Size: 17.1 MB - Last synced at: 1 day ago - Pushed at: 12 months ago - Stars: 43 - Forks: 9

2dameneko/ide-cap-chan
ide-cap-chan is a utility for batch image captioning with natural language using various VL models
Language: Python - Size: 1.82 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 11 - Forks: 0

alessandropier/Personalized-Fashion-Ads
VLM and LDM based platform for the generation of personalized advertising images in the fashion industry, with evaluation through statistical and choice models | Bachelor's Thesis
Language: Python - Size: 354 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 0 - Forks: 0

yueliu1999/Awesome-Jailbreak-on-LLMs
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
Size: 508 KB - Last synced at: 8 days ago - Pushed at: 16 days ago - Stars: 655 - Forks: 58

CVHub520/X-AnyLabeling
Effortless data labeling with AI support from Segment Anything and other awesome models.
Language: Python - Size: 114 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 5,404 - Forks: 601

coderonion/awesome-llm-and-aigc
🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.
Size: 268 KB - Last synced at: 9 days ago - Pushed at: 14 days ago - Stars: 672 - Forks: 60

zubair-irshad/Awesome-Robotics-3D
A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites
Size: 730 KB - Last synced at: 8 days ago - Pushed at: 6 months ago - Stars: 687 - Forks: 35

vlm-run/vlmrun-hub
A hub for various industry-specific schemas to be used with VLMs.
Language: Python - Size: 352 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 501 - Forks: 23

coderonion/awesome-yolo-object-detection
🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.
Size: 388 KB - Last synced at: 8 days ago - Pushed at: 26 days ago - Stars: 1,463 - Forks: 202

TideDra/VL-RLHF
A RLHF Infrastructure for Vision-Language Models
Language: Python - Size: 3.8 MB - Last synced at: 6 days ago - Pushed at: 6 months ago - Stars: 173 - Forks: 7

manycore-research/CAD2Program
[AAAI 2025] From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
Language: JavaScript - Size: 17.6 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 9 - Forks: 2

SabaPivot/ProjectionTrainer
Training Siglip-based vison-language model from the scratch.
Language: Python - Size: 150 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 0 - Forks: 0

Liquid4All/on-prem-stack
Scripts to launch Liquid on-prem stack
Language: Shell - Size: 195 KB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 2 - Forks: 1

DataEval/dingo
Dingo: A Comprehensive Data Quality Evaluation Tool
Language: JavaScript - Size: 15.7 MB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 137 - Forks: 19

NVIDIA-AI-Blueprints/video-search-and-summarization
Blueprint for Ingesting massive volumes of live or archived videos and extract insights for summarization and interactive Q&A
Language: Python - Size: 3.03 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 38 - Forks: 14

TIGER-AI-Lab/VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"
Language: Python - Size: 4.92 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 75 - Forks: 1

thubZ09/All-Things-Multimodal
Hub for researchers exploring VLMs and Multimodal Learning:)
Size: 48 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 26 - Forks: 1

MDGrey33/pyvisionai
The PyVisionAI Official Repo
Language: Python - Size: 9.93 MB - Last synced at: 4 days ago - Pushed at: 2 months ago - Stars: 102 - Forks: 11

xlang-ai/Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Language: Jupyter Notebook - Size: 134 MB - Last synced at: 11 days ago - Pushed at: 9 months ago - Stars: 123 - Forks: 7

katha-ai/VELOCITI
VELOCITI Benchmark Evaluation and Visualisation Code
Language: Python - Size: 186 KB - Last synced at: 12 days ago - Pushed at: 22 days ago - Stars: 6 - Forks: 0

neka-nat/cad3dify
2D to 3D CAD Conversion Using VLM
Language: Python - Size: 167 KB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 71 - Forks: 12

Open-Social-World/EgoNormia
EgoNormia | Benchmarking Physical Social Norm Understanding in VLMs
Language: Jupyter Notebook - Size: 11.3 MB - Last synced at: 14 days ago - Pushed at: 14 days ago - Stars: 8 - Forks: 0

km1994/AwesomeMultiModel
【AIGC 实战入门笔记 —— AIGC 摩天大楼】分享 大语言模型(LLMs),大模型高效微调(SFT),检索增强生成(RAG),智能体(Agent),PPT自动生成, 角色扮演,文生图(Stable Diffusion) ,图像文字识别(OCR),语音识别(ASR),语音合成(TTS),人像分割(SA),多模态(VLM),Ai 换脸(Face Swapping), 文生视频(VD),图生视频(SVD),Ai 动作迁移,Ai 虚拟试衣,数字人,全模态理解(Omni),Ai音乐生成 干货学习 等 实战与经验。
Size: 4.88 KB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 0

RauhanAhmed/AlphaExtract
AlphaExtract is a sophisticated PDF summarization tool that combines cutting-edge AI technology with efficient document processing. The project is built using Python and leverages Meta's LLaMA 4 MOE Maverick model along with Groq's inference engine to provide fast and accurate PDF summaries.
Language: Python - Size: 5.58 MB - Last synced at: 16 days ago - Pushed at: 16 days ago - Stars: 0 - Forks: 0

ammarlodhi255/Chest-xray-report-generation-app-with-chatbot-end-to-end-implementation
AI-powered Chest X-ray report generation app using VLM (Swin-T5) and LLM (LLaMA-3) for multilingual Q&A and medical education support.
Language: Jupyter Notebook - Size: 25.1 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 0 - Forks: 0

JackYFL/awesome-VLLMs
This repository collects papers on VLLM applications. We will update new papers irregularly.
Size: 934 KB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 97 - Forks: 9

taco-group/LangCoop
Official implementation of LangCoop: Collaborative Driving with Natural Language
Language: Python - Size: 53.9 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 7 - Forks: 0

BAAI-DCAI/Bunny
A family of lightweight multimodal models.
Language: Python - Size: 28.5 MB - Last synced at: 9 days ago - Pushed at: 6 months ago - Stars: 1,015 - Forks: 74

ProGamerGov/VLM-Captioning-Tools
Python scripts to use for captioning images with VLMs
Language: Python - Size: 21.5 KB - Last synced at: 18 days ago - Pushed at: 18 days ago - Stars: 39 - Forks: 0

taco-group/Re-Align
A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.
Language: Python - Size: 18.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 40 - Forks: 1

smaranjitghose/SightGuardAI
Capitalizing moondream's capabilities to build a CCTV frame-on-framer analyzer
Language: Python - Size: 1.24 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 1 - Forks: 0

cirocavani/smartrobot-rpi
AI Project using RPi 5 SBC, Hugging Face Candle, GStreamer Framework, Eclipse Zenoh and K3s.
Language: Rust - Size: 474 KB - Last synced at: 17 days ago - Pushed at: 19 days ago - Stars: 0 - Forks: 0

iris0329/SeeGround
[CVPR'25] SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
Language: Python - Size: 97.9 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 104 - Forks: 2

worldcuisines/worldcuisines
WorldCuisines is an extensive multilingual and multicultural benchmark that spans 30 languages, covering a wide array of global cuisines. (NAACL 2025 Main Conference)
Language: Jupyter Notebook - Size: 337 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 18 - Forks: 4

MSR3D/MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
Language: Python - Size: 75.7 MB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 50 - Forks: 2

ndurner/oai_chat
Multi-modal Chatbot based on OpenAI
Language: Python - Size: 128 KB - Last synced at: 21 days ago - Pushed at: 21 days ago - Stars: 4 - Forks: 0

om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
Language: Python - Size: 49 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 4,691 - Forks: 291

hasanar1f/HiRED
[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
Language: Python - Size: 23.9 MB - Last synced at: 22 days ago - Pushed at: 23 days ago - Stars: 29 - Forks: 4

miccunifi/Cross-the-Gap
[ICLR 2025] - Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Language: Python - Size: 23.2 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 36 - Forks: 0

camUrban/PteraSoftware
Ptera Software is a fast, easy-to-use, and open-source software package for analyzing flapping-wing flight.
Language: Python - Size: 183 MB - Last synced at: 17 days ago - Pushed at: 5 months ago - Stars: 193 - Forks: 40

JoeJoe1313/LLMs-Journey
Various LLM resources and experiments
Language: Jupyter Notebook - Size: 17.7 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 1 - Forks: 0

przeprogramowani/10x-test-planner
A Node-based CLI tool to generate test plans from video recordings using Google's Gemini models.
Language: TypeScript - Size: 8.13 MB - Last synced at: 21 days ago - Pushed at: 25 days ago - Stars: 1 - Forks: 0

Hon-Wong/VoRA
[Fully open] [Encoder-free MLLM] Vision as LoRA
Language: Python - Size: 9.08 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 103 - Forks: 6

joanrod/star-vector
StarVector is a foundation model for SVG generation that transforms vectorization into a code generation task. Using a vision-language modeling architecture, StarVector processes both visual and textual inputs to produce high-quality SVG code with remarkable precision.
Language: Python - Size: 6.3 MB - Last synced at: 25 days ago - Pushed at: 25 days ago - Stars: 3,570 - Forks: 186

haoranD/Awesome-Embodied-AI
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
Size: 116 KB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 416 - Forks: 16

tejas-54/Visual-Search-Engine-Using-VLM
Visual Search Engine using VLM (Vision-Language Model) A powerful visual search system that lets users find similar items through image and text queries.
Language: Python - Size: 2.26 MB - Last synced at: 28 days ago - Pushed at: 28 days ago - Stars: 0 - Forks: 0

lamalab-org/macbench
Probing the limitations of multimodal language models for chemistry and materials research
Language: Python - Size: 2.18 GB - Last synced at: 27 days ago - Pushed at: about 2 months ago - Stars: 14 - Forks: 0

baaivision/EVE
EVE Series: Encoder-Free Vision-Language Models from BAAI
Language: Python - Size: 6.95 MB - Last synced at: 28 days ago - Pushed at: 2 months ago - Stars: 320 - Forks: 8

QiuYannnn/Local-File-Organizer
An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.
Language: Python - Size: 27.2 MB - Last synced at: 29 days ago - Pushed at: 7 months ago - Stars: 2,198 - Forks: 179

BjornMelin/pdf-ocr-streamlit
Language: Python - Size: 6.84 KB - Last synced at: 29 days ago - Pushed at: 29 days ago - Stars: 0 - Forks: 0

uzh-dqbm-cmi/RadVLM
A Multitask Conversational Vision-Language Model for Radiology
Language: Python - Size: 83.7 MB - Last synced at: 30 days ago - Pushed at: 30 days ago - Stars: 3 - Forks: 0

TIGER-AI-Lab/Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]
Language: Python - Size: 83.6 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 211 - Forks: 20

peterdsharpe/AeroSandbox
Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.
Language: Jupyter Notebook - Size: 197 MB - Last synced at: 29 days ago - Pushed at: 3 months ago - Stars: 836 - Forks: 140
