GitHub topics: vision-language-pretraining

Repositories

SiyuanYan1/Derm1M

[ICCV'25 Highlight] Derm1M: A Million‑Scale Vision‑Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

Language: Python - Size: 5.47 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 24 - Forks: 2

SiyuanYan1/MAKE

[MICCAI‘25 Early Accept] MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment

Language: Python - Size: 2.31 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 7 - Forks: 0

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language: Python - Size: 79.1 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 1,430 - Forks: 117

DAMO-NLP-SG/Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language: Python - Size: 19.6 MB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 3,061 - Forks: 280

mbzuai-oryx/VideoGPT-plus

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Language: Python - Size: 16.5 MB - Last synced at: 6 days ago - Pushed at: about 1 month ago - Stars: 285 - Forks: 19

Fr0zenCrane/UniCoT

Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Language: Python - Size: 86.1 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 67 - Forks: 1

deepseek-ai/Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Language: Python - Size: 6.98 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 17,478 - Forks: 2,243

jaisidhsingh/LoRA-CLIP

Easy wrapper for inserting LoRA layers in CLIP.

Language: Python - Size: 60.5 KB - Last synced at: 10 days ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 2

thisisiron/LLaVA-Pool

🌋 A flexible framework for training and configuring Vision-Language Models

Language: Python - Size: 3.17 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 1 - Forks: 0

BridgeVLA/BridgeVLA

✨✨Official implementation of BridgeVLA

Language: Python - Size: 303 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 64 - Forks: 5

jusiro/FLAIR

[MedIA'25] FLAIR: A Foundation LAnguage-Image model of the Retina for fundus image understanding.

Language: Python - Size: 1.53 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 130 - Forks: 15

marslanm/Multimodality-Representation-Learning

This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have been cited and discussed in the survey just accepted https://dl.acm.org/doi/abs/10.1145/3617833 .

Size: 63.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 75 - Forks: 7

sail-sg/ptp

[CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》

Language: Python - Size: 2.37 MB - Last synced at: 2 months ago - Pushed at: over 2 years ago - Stars: 152 - Forks: 4

deepseek-ai/DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language: Python - Size: 12.2 MB - Last synced at: 3 months ago - Pushed at: over 1 year ago - Stars: 3,845 - Forks: 569

mvish7/AlignVLM

This repository contains the implementation of AlignVLM paper, which proposes a novel method for vision language alignment

Language: Python - Size: 276 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 2 - Forks: 0

salesforce/LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language: Jupyter Notebook - Size: 79.3 MB - Last synced at: 4 months ago - Pushed at: 10 months ago - Stars: 10,558 - Forks: 1,031

YyzHarry/vlm-fairness

[Science Advances] Demographic Bias of Vision-Language Foundation Models in Medical Imaging

Language: Python - Size: 1.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 16 - Forks: 3

PGSmall/clip-pgs

Official code for CVPR2025 "Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection"

Language: Python - Size: 8.97 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 0 - Forks: 1

Sense-GVT/DeCLIP

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

Language: Python - Size: 970 KB - Last synced at: 5 months ago - Pushed at: almost 3 years ago - Stars: 649 - Forks: 32

TencentARC/FLM

Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)

Language: Python - Size: 7 MB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 32 - Forks: 1

Surrey-UP-Lab/RegionSpot

Recognize Any Regions

Language: Python - Size: 2.16 MB - Last synced at: 7 months ago - Pushed at: 9 months ago - Stars: 122 - Forks: 4

TXH-mercury/COSA

[ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Language: Python - Size: 84.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 39 - Forks: 3

TXH-mercury/VALOR

[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Language: Python - Size: 75.6 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 268 - Forks: 15

ahmdtaha/distributed_sigmoid_loss

Unofficial implementation for Sigmoid Loss for Language Image Pre-Training

Language: Python - Size: 62.5 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 0

ChenDelong1999/ITRA

A codebase for flexible and efficient Image Text Representation Alignment

Language: Python - Size: 5.62 MB - Last synced at: 10 months ago - Pushed at: about 2 years ago - Stars: 16 - Forks: 1

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 11 - Forks: 0

BUAADreamer/CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Language: Python - Size: 644 KB - Last synced at: 3 days ago - Pushed at: about 1 year ago - Stars: 5 - Forks: 0

HieuPhan33/CVPR2024_MAVL

Multi-Aspect Vision Language Pretraining - CVPR2024

Language: Python - Size: 7.84 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 42 - Forks: 0

unitaryai/VTC-dataset

Language: Python - Size: 38.1 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

yiren-jian/BLIText

[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training

Language: Python - Size: 34.4 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 17 - Forks: 1

Zoky-2020/SGA

Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models. [ICCV 2023 Oral]

Language: Python - Size: 7.64 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 2

megvii-research/protoclip

📍 Official pytorch implementation of ProtoCLIP in paper Prototypical Contrastive Language Image Pretraining (IEEE TNNLS)

Language: Python - Size: 2.58 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 0

LooperXX/ManagerTower

Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

Language: Python - Size: 6.71 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 9 - Forks: 1

adarobustness/adaptation_robustness

Evaluate robustness of adaptation methods on large vision-language models

Language: Shell - Size: 1.66 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

omipan/svl_adapter

SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models

Language: Python - Size: 112 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 17 - Forks: 2

ArrowLuo/SegCLIP

PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"

Language: Python - Size: 2.48 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 35 - Forks: 1

alinlab/b2t

Explaining Visual Biases as Words by Generating Captions

Language: Python - Size: 2.98 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 8 - Forks: 1

vgthengane/Continual-CLIP

Official repository for "CLIP model is an Efficient Continual Learner".

Language: Python - Size: 1.62 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 33 - Forks: 0

Related Keywords

vision-language-pretraining 38 vision-language-model 8 clip 6 multimodal-deep-learning 6 vision-language 4 foundation-models 4 multimodal 4 self-supervised-learning 3 llava 3 vision-language-transformer 3 deep-learning 3 contrastive-learning 3 computer-vision 2 llm 2 pytorch 2 unified-model 2 multimodal-datasets 2 open-vocabulary 2 medical-imaging 2 multimodal-representation-learning 2 any-to-any 2 cross-modal 2 dataset 2 dermatology-ai 2 video-text-retrieval 2 parameter-efficient-tuning 2 medical-image-analysis 2 zero-shot 2 video-understanding 2 chatbot 2 llama 2 vicuna 2 vision-and-language 2 video-conversation 2 python3 1 distributed-data-parallel 1 unsupervised-learning 1 vision-transformer 1 zero-shot-classification 1 vlms 1 big-model 1 image-text 1 multi-model 1 self-supervised 1 language-modeling 1 auto-labeling 1 instance-segmentation 1 object-detection 1 open-world 1 vision-foundation-model 1 vision-language-foundation-model 1 video-captioning 1 video-language-pretrainng 1 video-qa 1 video-retrieval 1 audio-language-pretraining 1 audiovisual-language-pretraining 1 zero-shot-segmentation 1 vision-language-dataset 1 adversarial-attack 1 multi-modal-learning 1 vision-language-learning 1 adaptation 1 robustness 1 open-vocabulary-semantic-segmentation 1 semantic-segmentation 1 transfer-learning 1 zero-shot-semantic-segmentation 1 bias-and-fairness 1 explainable-ai 1 baseline 1 continual-learning 1 foundational-models 1 multimodal-learning 1 comments 1 cross-lingual 1 cross-lingual-retrieval 1 cross-modal-retrieval 1 iglue 1 image-text-retrieval 1 image-text-search 1 kdd2024 1 mscoco 1 multi30k 1 retrieval 1 swin-transformer 1 wit 1 xflickrco 1 xlm-roberta 1 medical-vision-and-language-pretraining 1 lora 1 image-text-matching 1 unicot 1 uni-cot 1 cot 1 chain-of-thought-reasoning 1 chain-of-thought 1 artificial-intelligence 1 video-encoder 1 video-chatbot 1