An open API service providing repository metadata for many open source software ecosystems.

Topic: "visual-language-models"

THUDM/CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Language: Python - Size: 25.8 MB - Last synced at: 5 days ago - Pushed at: 12 months ago - Stars: 6,541 - Forks: 429

camel-ai/crab

🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/

Language: Python - Size: 7.89 MB - Last synced at: 4 days ago - Pushed at: 14 days ago - Stars: 339 - Forks: 50

bilel-bj/ROSGPT_Vision

Commanding robots using only Language Models' prompts

Language: Python - Size: 21.4 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 98 - Forks: 13

hk-zh/language-conditioned-robot-manipulation-models

https://arxiv.org/abs/2312.10807

Size: 1.28 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 51 - Forks: 1

jaisidhsingh/CoN-CLIP

Implementation of the "Learn No to Say Yes Better" paper.

Language: Python - Size: 4.36 MB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 31 - Forks: 2

tianyu-z/VCR

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Language: Python - Size: 6.58 MB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 31 - Forks: 2

AlignGPT-VL/AlignGPT

Official repo for "AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability"

Language: Python - Size: 1.97 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 28 - Forks: 3

kesimeg/awesome-turkish-language-models

A curated list of Turkish AI models, datasets, papers

Size: 4.88 KB - Last synced at: 11 days ago - Pushed at: 11 days ago - Stars: 27 - Forks: 0

csebuetnlp/IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

Language: Jupyter Notebook - Size: 87.7 MB - Last synced at: 22 days ago - Pushed at: 22 days ago - Stars: 13 - Forks: 2

BioMedIA-MBZUAI/FetalCLIP

Official repository of FetalCLIP: A Visual-Language Foundation Model for Fetal Ultrasound Image Analysis

Language: Python - Size: 12.6 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 13 - Forks: 0

sduzpf/UAP_VLP

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

Language: Python - Size: 14.8 MB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 13 - Forks: 0

Sid2697/HOI-Ref

Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"

Language: Python - Size: 6.73 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 2

amathislab/wildclip

Scene and animal attribute retrieval from camera trap data with domain-adapted vision-language models

Language: Python - Size: 5.27 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 12 - Forks: 0

declare-lab/Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Language: Python - Size: 8.92 MB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 11 - Forks: 3

GraphPKU/CoI

Chain of Images for Intuitively Reasoning

Language: Python - Size: 5.17 MB - Last synced at: 8 days ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

ArthurBabkin/Parimate

A Telegram bot for validating audio and video content using CV models, SR models, and VLMs, with deepfake detection leveraging metadata analysis.

Language: Python - Size: 115 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 6 - Forks: 1

AikyamLab/hallucinogen

A benchmark for evaluating hallucinations in large visual language models

Language: Python - Size: 1.32 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 6 - Forks: 0

xinyanghuang7/Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

Language: Python - Size: 34.7 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 3 - Forks: 0

CristianoPatricio/concept-based-interpretability-VLM

Code for the paper "Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models", ISBI 2024 (Oral).

Language: Jupyter Notebook - Size: 21.2 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 3 - Forks: 0

ARResearch-1/DiverseAR-Dataset

Advancing the Understanding and Evaluation of AR-Generated Scenes: When Vision-Language Models Shine and Stumble

Size: 4.59 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

rng190001/CS6375-ResearchProject

Visual Language Model focusing on testing different parsing techniques from generated responses

Language: Jupyter Notebook - Size: 880 KB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 1 - Forks: 0

nkkbr/ViCA

Official Implementation of ViCA2

Language: Python - Size: 2.29 MB - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 0 - Forks: 0

rs-anderson/explicit-alignment-for-vqa-tasks

Explicit alignment for few-shot Visual Question Answering.

Language: Python - Size: 20 MB - Last synced at: 13 days ago - Pushed at: 13 days ago - Stars: 0 - Forks: 0

DavidLMS/DescribePDF

A tool to convert PDF files to detailed Markdown descriptions using VLMs

Language: Python - Size: 1.2 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 0 - Forks: 0

HemantM29/Multimodal-Document-Analysis-and-Query-Retrieval

This project performs multimodal document analysis and query retrieval by downloading PDFs, converting pages to images, indexing them for semantic search, and analyzing retrieved images using visual-language models like Qwen2VL and Blip2.

Language: Jupyter Notebook - Size: 1.9 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

laclouis5/uform-coreml-converters

CLI for converting UForm models to CoreML.

Language: Python - Size: 121 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Related Topics
large-language-models 7 deep-learning 3 multimodal-large-language-models 2 vlm 2 medical-imaging 2 foundation-models 2 multimodal 2 computer-vision 2 clip 2 transformers 2 chatgpt 2 llm 2 prompt-engineering 2 chain-of-image 1 llama 1 llava 1 liveness-detection 1 face-recognition 1 deepfake-detection 1 audio-recognition 1 audio-processing 1 semantic-search 1 retrieval-augmented-generation 1 qwen2-vl 1 pdf-processing 1 natural-language-queries 1 multimodal-analysis 1 image-indexing 1 blip2 1 ui-ux-design 1 tf-idf-vectorizer 1 research-project 1 parsing-algorithms 1 nlp-machine-learning 1 machine-learning 1 llava-next 1 cosine-similarity 1 bert-embeddings 1 bart 1 pretrained-models 1 multi-modal 1 multi-agent-systems 1 language-model-agent 1 gui-automation 1 huggingface-transformers 1 deepspeed 1 medical-visual-language-model 1 chain-of-throught 1 medical-safety 1 hallucination-evaluation 1 hallucination-detection 1 aisafety 1 ai 1 video-understanding 1 video-question-answering 1 naacl2024 1 multimodality 1 pytorch 1 chatbot 1 dalle3 1 gpt4v 1 image-text-matching 1 image-captions 1 compositionality 1 telegram-bot 1 speech-recognition 1 postgresql 1 natural-language-processing 1 mvp 1 ultrasound-imaging 1 fetalclip 1 fetal-ultrasound 1 artificial-intelligence 1 computervision 1 camera-trap 1 behavior 1 visual-language-learning 1 multimodel-large-language-model 1 deep-neural-networks 1 adversarial-attacks 1 vqa-dataset 1 vqa 1 optical-illusions 1 uform 1 coremltools 1 coreml 1 tool 1 pdf-document-processor 1 gradio-python-app 1 cli 1 ros2 1 robotics 1 robotic-vision 1 robotic-design-patterns 1 prompting-robotic-modalities 1 language-models-are-next 1 language-models 1 language-model 1 cross-modality 1 turkish-nlp 1