vision-language-transformer | Topic

Topic: "vision-language-transformer"

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 33 minutes ago - Pushed at: 6 months ago - Stars: 10,534 - Forks: 1,026

IDEA-Research/GroundingDINO

[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

Language: Python - Size: 12.5 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 7,994 - Forks: 800

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

AlibabaResearch/AdvancedLiterateMachinery

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

Language: C++ - Size: 104 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,685 - Forks: 191

henghuiding/ReLA

[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation

Language: Python - Size: 2.06 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 693 - Forks: 19

shenyunhang/APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language: Python - Size: 49.3 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 490 - Forks: 29

henghuiding/Vision-Language-Transformer

[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation

Language: Python - Size: 322 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 352 - Forks: 23

haoliuhl/instructrl

Instruction Following Agents with Multimodal Transforemrs

Language: Python - Size: 191 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 52 - Forks: 5

sdc17/CrossGET

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

Language: Python - Size: 11.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 26 - Forks: 0

sMamooler/CLIP_Explainability

code for studying OpenAI's CLIP explainability

Language: Jupyter Notebook - Size: 470 MB - Last synced at: 10 months ago - Pushed at: over 3 years ago - Stars: 23 - Forks: 3

yiren-jian/BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language: Python - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 1

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

ThomasVonWu/Awesome-VLMs-Strawberry

A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.

Size: 760 KB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 10 - Forks: 1

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8 - Forks: 1

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot

Streamlit App Combining Vision, Language, and Audio AI Models

Language: Python - Size: 18.6 KB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI

Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling

Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

PrateekJannu/Vision-GPT

Coding a Multi-Modal vision model like GPT-4o from scratch, inspired by @hkproj and PaliGemma

Language: Python - Size: 591 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge

VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)

Language: Python - Size: 90.6 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos