An open API service providing repository metadata for many open source software ecosystems.

Topic: "image-text-retrieval"

OpenGVLab/InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Language: Python - Size: 38.5 MB - Last synced at: 19 days ago - Pushed at: 25 days ago - Stars: 7,817 - Forks: 591

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

OFA-Sys/Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language: Python - Size: 2.5 MB - Last synced at: 12 days ago - Pushed at: 9 months ago - Stars: 5,142 - Forks: 496

Paranioar/Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

Size: 369 KB - Last synced at: 5 days ago - Pushed at: 5 months ago - Stars: 423 - Forks: 48

greyovo/PicQuery

🔍 Search local images with natural language on Android, powered by OpenAI's CLIP model. / 在 Android 上用自然语言搜索本地图片 (基于 OpenAI 的 CLIP 模型)

Language: Kotlin - Size: 49.1 MB - Last synced at: 23 days ago - Pushed at: 3 months ago - Stars: 398 - Forks: 44

Paranioar/SGRAF

[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

Language: Python - Size: 794 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 197 - Forks: 37

chuhaojin/Text2Poster-ICASSP-22

Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"

Language: Python - Size: 50.1 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 171 - Forks: 12

alipay/Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

Language: Python - Size: 17 MB - Last synced at: 17 days ago - Pushed at: 10 months ago - Stars: 143 - Forks: 5

howard-hou/BagFormer

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Language: Python - Size: 3.44 MB - Last synced at: about 11 hours ago - Pushed at: over 2 years ago - Stars: 99 - Forks: 33

X-PLUG/mPLUG

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)

Language: Python - Size: 1.56 MB - Last synced at: 2 months ago - Pushed at: about 2 years ago - Stars: 89 - Forks: 7

MILVLG/rosita

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Language: Python - Size: 15.9 MB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 55 - Forks: 13

slavabarkov/tidy

Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

Language: Kotlin - Size: 99.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 5

sdc17/CrossGET

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

Language: Python - Size: 11.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 26 - Forks: 0

eric-ai-lab/ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

Language: Python - Size: 7.86 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 22 - Forks: 0

eric-ai-lab/CPL

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

Language: Python - Size: 2.59 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 22 - Forks: 4

Paranioar/RCAR

[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

Language: Python - Size: 1.72 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 2

alipay/PC2-NoiseofWeb

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

Language: Python - Size: 13.6 MB - Last synced at: 17 days ago - Pushed at: 6 months ago - Stars: 12 - Forks: 1

hpc203/Chinese-CLIP-opencv-onnxrun

使用OpenCV+onnxruntime部署中文clip做以文搜图,给出一句话来描述想要的图片,就能从图库中搜出来符合要求的图片。包含C++和Python两个版本的程序

Language: C++ - Size: 4.03 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

cobanov/image-captioning

Image captioning using python and BLIP

Language: Python - Size: 28.2 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 10 - Forks: 3

frank-chris/ImageTextRetrieval

In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.

Language: Jupyter Notebook - Size: 6.88 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

Paranioar/DBL

[TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”

Language: Python - Size: 783 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

kaylode/tern

Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

Paranioar/GSSF

[TIP2024] The code of "GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning"

Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

BUAADreamer/CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Language: Python - Size: 644 KB - Last synced at: 30 days ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

whats2000/WeiMoCIR

Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and Similarity (TAAI 2024)

Language: Jupyter Notebook - Size: 454 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 1 - Forks: 0

LIU42/Contrastive

项目取材自 2024 年 ”泰迪杯“ 数据挖掘挑战赛 B 题,基于共享特征空间对比学习的跨模态图文互检模型

Language: Python - Size: 20.5 KB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 1 - Forks: 0

mrzjy/GenshinCLIP

A simple open-sourced SigLIP model finetuned on Genshin Impact's image-text pairs.

Size: 1.06 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 1 - Forks: 0

Moenupa/clip-image-search

Searching Images: From Clip And Beyond

Language: Jupyter Notebook - Size: 21.9 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

wocns1457/CCTV-based-clothing-analysis-and-search-system

Language: Python - Size: 23.1 MB - Last synced at: about 2 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

AmMoPy/semantic-search-question-answer

Matching questions to correct answers using pre-trained BERT models.

Language: Jupyter Notebook - Size: 1.86 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Paranioar/Awesome_Image_Text_Retrieval_Benchmark

The Unified Code of Image-Text Retrieval for Further Exploration.

Language: Python - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

romrawinjp/modern-image-search

Modern Image Search's course repository for Super AI Engineer Development Program SS4

Language: Jupyter Notebook - Size: 12 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

Related Topics
cross-modal-retrieval 12 image-text-matching 12 clip 7 pytorch 7 image-retrieval 6 vision-and-language 5 deep-learning 5 image-captioning 5 visual-reasoning 4 vision-language 4 multi-modal 4 transformer 3 multimodal-learning 3 visual-question-answering 3 vision-language-transformer 3 vision-and-language-pre-training 3 vqa 3 image-search 3 text-matching 3 tip 3 nlp 3 image-text-search 3 computer-vision 3 image-processing 3 semantic-search 2 openclip 2 blip 2 video-text-retrieval 2 multimodal-large-language-models 2 android 2 benchmark 2 bert 2 image-classification 2 transformers 2 multi-modal-learning 2 multimodal 2 pretrained-models 2 tensorflow 1 multimodal-deep-learning 1 efficient-deep-learning 1 multi-task-learning 1 knowledge-graph 1 framework 1 knowledge-enhanced-vision-language 1 model-acceleration 1 text-image-retrieval 1 token-ensemble 1 token-matching 1 awesome-list 1 large-language-model 1 multimodal-retrieval 1 story-visualization 1 flask 1 vision-and-language-navigation 1 visual-commonsense-reasoning 1 cross-modal-learning 1 visual-dialog 1 sentence-transformers 1 natural-language-processing 1 fine-tuning 1 visual-grounding 1 bert-embeddings 1 training-free 1 visual-storytelling 1 composed-image-retrieval 1 cross-modal 1 iglue 1 kdd2024 1 mscoco 1 multi30k 1 retrieval 1 swin-transformer 1 vision-language-pretraining 1 wit 1 xflickrco 1 xlm-roberta 1 chinese 1 contrastive-loss 1 coreml-models 1 jetpack-compose 1 material-design-3 1 openai 1 large-language-models 1 large-vision-language-models 1 large-vision-models 1 memory-efficient-tuning 1 multimodal-pretraining 1 parameter-efficient-fine-tuning 1 text-to-image-generation 1 text-to-image-synthesis 1 text-to-video-generation 1 tutorial 1 video-text-recognition 1 visual-semantic-embedding 1 image-text 1 pretraining 1 visual-language 1 cross-lingual 1 cross-lingual-retrieval 1 cnn 1