Topic: "vision-language-transformer"
salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 33 minutes ago - Pushed at: 6 months ago - Stars: 10,534 - Forks: 1,026

IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Language: Python - Size: 12.5 MB - Last synced at: 3 days ago - Pushed at: 9 months ago - Stars: 7,994 - Forks: 800

salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Language: C++ - Size: 104 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 1,685 - Forks: 191

henghuiding/ReLA
[CVPR2023 Highlight] GRES: Generalized Referring Expression Segmentation
Language: Python - Size: 2.06 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 693 - Forks: 19

shenyunhang/APE
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Language: Python - Size: 49.3 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 490 - Forks: 29

henghuiding/Vision-Language-Transformer
[ICCV2021 & TPAMI2023] Vision-Language Transformer and Query Generation for Referring Segmentation
Language: Python - Size: 322 KB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 352 - Forks: 23

haoliuhl/instructrl
Instruction Following Agents with Multimodal Transforemrs
Language: Python - Size: 191 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 52 - Forks: 5

sdc17/CrossGET
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Language: Python - Size: 11.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 26 - Forks: 0

sMamooler/CLIP_Explainability
code for studying OpenAI's CLIP explainability
Language: Jupyter Notebook - Size: 470 MB - Last synced at: 10 months ago - Pushed at: over 3 years ago - Stars: 23 - Forks: 3

yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Language: Python - Size: 34.4 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 1

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

ThomasVonWu/Awesome-VLMs-Strawberry
A collection of VLMs papers, blogs, and projects, with a focus on VLMs in Autonomous Driving and related reasoning techniques.
Size: 760 KB - Last synced at: 2 days ago - Pushed at: 6 months ago - Stars: 10 - Forks: 1

akusayudodograu/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 8 - Forks: 1

marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

fork123aniket/Multi-Round-VLM-powered-Multimodal-Conversational-AI-Navigation-Bot
Streamlit App Combining Vision, Language, and Audio AI Models
Language: Python - Size: 18.6 KB - Last synced at: 20 days ago - Pushed at: 3 months ago - Stars: 3 - Forks: 0

fork123aniket/Agentic-RAG-Story-Generation-with-Multimodal-GenAI
Multimodal Agentic GenAI Workflow – Seamlessly blends retrieval and generation for intelligent storytelling
Language: Python - Size: 94.7 KB - Last synced at: about 1 month ago - Pushed at: 3 months ago - Stars: 2 - Forks: 0

PrateekJannu/Vision-GPT
Coding a Multi-Modal vision model like GPT-4o from scratch, inspired by @hkproj and PaliGemma
Language: Python - Size: 591 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 1 - Forks: 0

atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
Language: Python - Size: 90.6 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0
