GitHub topics: vision-language-pretraining
SiyuanYan1/Derm1M
[ICCV'25 Highlight] Derm1M: A Million‑Scale Vision‑Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology
Language: Python - Size: 5.47 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 24 - Forks: 2

SiyuanYan1/MAKE
[MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment
Language: Python - Size: 2.31 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 7 - Forks: 0

mbzuai-oryx/Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Language: Python - Size: 79.1 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 1,430 - Forks: 117

DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Language: Python - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 3,061 - Forks: 280

mbzuai-oryx/VideoGPT-plus
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Language: Python - Size: 16.5 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 285 - Forks: 19

Fr0zenCrane/UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Language: Python - Size: 86.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 67 - Forks: 1

deepseek-ai/Janus
Janus-Series: Unified Multimodal Understanding and Generation Models
Language: Python - Size: 6.98 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 17,478 - Forks: 2,243

jaisidhsingh/LoRA-CLIP
Easy wrapper for inserting LoRA layers in CLIP.
Language: Python - Size: 60.5 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 2

thisisiron/LLaVA-Pool
🌋 A flexible framework for training and configuring Vision-Language Models
Language: Python - Size: 3.17 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

BridgeVLA/BridgeVLA
✨✨Official implementation of BridgeVLA
Language: Python - Size: 303 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 64 - Forks: 5

jusiro/FLAIR
[MedIA'25] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.
Language: Python - Size: 1.53 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 130 - Forks: 15

marslanm/Multimodality-Representation-Learning
This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .
Size: 63.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 75 - Forks: 7

sail-sg/ptp
[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》
Language: Python - Size: 2.37 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 152 - Forks: 4

deepseek-ai/DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Language: Python - Size: 12.2 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 3,845 - Forks: 569

mvish7/AlignVLM
This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment
Language: Python - Size: 276 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

salesforce/LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 10,558 - Forks: 1,031

YyzHarry/vlm-fairness
[Science Advances] Demographic Bias of Vision-Language Foundation Models in Medical Imaging
Language: Python - Size: 1.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 16 - Forks: 3

PGSmall/clip-pgs
Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"
Language: Python - Size: 8.97 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

Sense-GVT/DeCLIP
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Language: Python - Size: 970 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 649 - Forks: 32

TencentARC/FLM
Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)
Language: Python - Size: 7 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 1

Surrey-UP-Lab/RegionSpot
Recognize Any Regions
Language: Python - Size: 2.16 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 122 - Forks: 4

TXH-mercury/COSA
[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Language: Python - Size: 84.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 39 - Forks: 3

TXH-mercury/VALOR
[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Language: Python - Size: 75.6 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 268 - Forks: 15

ahmdtaha/distributed_sigmoid_loss
Unofficial implementation for Sigmoid Loss for Language Image Pre-Training
Language: Python - Size: 62.5 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 0

ChenDelong1999/ITRA
A codebase for flexible and efficient Image Text Representation Alignment
Language: Python - Size: 5.62 MB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 1

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 0

BUAADreamer/CCRK
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Language: Python - Size: 644 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

HieuPhan33/CVPR2024_MAVL
Multi-Aspect Vision Language Pretraining - CVPR2024
Language: Python - Size: 7.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 0

unitaryai/VTC-dataset
Language: Python - Size: 38.1 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

yiren-jian/BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Language: Python - Size: 34.4 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 1

Zoky-2020/SGA
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]
Language: Python - Size: 7.64 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 2

megvii-research/protoclip
📍 Official pytorch implementation of ProtoCLIP in paper Prototypical Contrastive Language Image Pretraining (IEEE TNNLS)
Language: Python - Size: 2.58 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 0

LooperXX/ManagerTower
Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning
Language: Python - Size: 6.71 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

adarobustness/adaptation_robustness
Evaluate robustness of adaptation methods on large vision-language models
Language: Shell - Size: 1.66 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

omipan/svl_adapter
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Language: Python - Size: 112 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 2

ArrowLuo/SegCLIP
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
Language: Python - Size: 2.48 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 35 - Forks: 1

alinlab/b2t
Explaining Visual Biases as Words by Generating Captions
Language: Python - Size: 2.98 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

vgthengane/Continual-CLIP
Official repository for "CLIP model is an Efficient Continual Learner".
Language: Python - Size: 1.62 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 33 - Forks: 0
