Topic: "cross-modal-retrieval"
jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Language: Python - Size: 27.4 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 12,640 - Forks: 2,075

YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language: Python - Size: 12.2 MB - Last synced at: 13 days ago - Pushed at: about 2 years ago - Stars: 970 - Forks: 105

Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Size: 369 KB - Last synced at: about 23 hours ago - Pushed at: 4 months ago - Stars: 425 - Forks: 48

zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Size: 82.3 MB - Last synced at: 14 days ago - Pushed at: 5 months ago - Stars: 401 - Forks: 19

layumi/Image-Text-Embedding
TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535
Language: MATLAB - Size: 6.02 MB - Last synced at: 13 days ago - Pushed at: 3 months ago - Stars: 290 - Forks: 73

Paranioar/SGRAF
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Language: Python - Size: 794 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 197 - Forks: 37

woodfrog/vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021
Language: Python - Size: 3.91 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 136 - Forks: 18

yalesong/pvse
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
Language: Python - Size: 16 MB - Last synced at: 23 days ago - Pushed at: about 1 year ago - Stars: 134 - Forks: 23

jpthu17/EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Language: Python - Size: 23.9 MB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 132 - Forks: 9

penghu-cs/DSCMR
Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)
Language: Python - Size: 10.6 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 131 - Forks: 24

jpthu17/DiffusionRet
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Language: Python - Size: 5.36 MB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 129 - Forks: 7

naver-ai/pcme
Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
Language: Python - Size: 2.11 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 119 - Forks: 17

jpthu17/HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Language: Python - Size: 51 MB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 116 - Forks: 5

ilaria-manco/muscall
Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)
Language: Python - Size: 193 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 106 - Forks: 11

howard-hou/BagFormer
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Language: Python - Size: 3.44 MB - Last synced at: about 14 hours ago - Pushed at: over 2 years ago - Stars: 99 - Forks: 33

naver-ai/eccv-caption
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Language: Python - Size: 771 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 56 - Forks: 2

AyanKumarBhunia/on-the-fly-FGSBIR
[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .
Language: Python - Size: 20.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 55 - Forks: 13

jpthu17/DiCoSA
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Language: Python - Size: 5.56 MB - Last synced at: 26 days ago - Pushed at: about 1 year ago - Stars: 51 - Forks: 2

penghu-cs/MRL
Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
Language: Python - Size: 23.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 10

BrandonHanx/TextReID
[BMVC 2021] Text-Based Person Search with Limited Data
Language: Python - Size: 96.7 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 42 - Forks: 5

naver-ai/pcmepp
Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Language: Python - Size: 15.3 MB - Last synced at: 12 months ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 1

WendellGul/AGAH
Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".
Language: Python - Size: 553 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 36 - Forks: 11

kyuyeonpooh/objects-that-sound
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Language: Python - Size: 163 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 4

BUAADreamer/SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
Language: Python - Size: 4.2 MB - Last synced at: 12 days ago - Pushed at: 6 months ago - Stars: 30 - Forks: 3

mako443/Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
Language: Python - Size: 450 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 3

penghu-cs/SDML
Scalable deep multimodal learning for cross-modal retrieval (SIGIR 2019, PyTorch Code)
Language: Python - Size: 23.5 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 30 - Forks: 13

penghu-cs/UCCH
Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)
Language: Python - Size: 2.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 29 - Forks: 8

slavabarkov/tidy
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Language: Kotlin - Size: 99.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 5

idealwhite/VLDeformer
Pytorch implement of the paper "VLDeformer: Vision Language Decomposed Transformer for Fast Cross-modal Retrieval", KBS 2022
Language: Jupyter Notebook - Size: 2.42 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 27 - Forks: 3

penghu-cs/MAN
Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)
Language: Python - Size: 8.43 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 26 - Forks: 6

mesnico/ALADIN
Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"
Language: Python - Size: 17.6 MB - Last synced at: 17 days ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 6

LivXue/GNN4CMR
PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval".
Language: Python - Size: 596 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 3

Paranioar/RCAR
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
Language: Python - Size: 1.72 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 2

MartinYuanNJU/SEMScene
Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).
Language: Python - Size: 36.6 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 20 - Forks: 0

jaychempan/SWAN-pytorch
Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval (ICMR'23 Oral)
Language: Python - Size: 2.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 4

xiaoyuan1996/SemanticLocalizationMetrics
The first research for semantic localization
Language: Python - Size: 41.3 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 4

peri044/STT
A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.
Language: Python - Size: 103 KB - Last synced at: 25 days ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 5

ict-bigdatalab/VNEL
Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"
Size: 4.91 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 0

haomo-ai/ModaLink
[IROS 2024] This repository contains the implementation of our paper: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
Language: Python - Size: 38.1 KB - Last synced at: 3 months ago - Pushed at: 8 months ago - Stars: 15 - Forks: 0

penghu-cs/MvLDAN
Multi-view Linear Discriminant Analysis Network for Cross-modal Retrieval and Cross-view Recognition (Keras&Theano Code)
Language: Python - Size: 38.5 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

zzezze/NeighborRetr
Official implementation of "NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval (CVPR 2025)"
Language: Python - Size: 4.71 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 12 - Forks: 1

alipay/PC2-NoiseofWeb
Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.
Language: Python - Size: 13.6 MB - Last synced at: about 5 hours ago - Pushed at: 5 months ago - Stars: 12 - Forks: 1

ivonajdenkoska/tulip
[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"
Language: Python - Size: 27.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

zhouyu1996/DAQN
An implement of our paper “DEEP ADVERSARIAL QUANTIZATION NETWORK FOR CROSS-MODAL RETRIEVAL”
Language: Python - Size: 42 KB - Last synced at: about 2 years ago - Pushed at: almost 4 years ago - Stars: 10 - Forks: 3

LivXue/ALGCN
This repository contains the author's implementation in PyTorch for the paper "Adaptive Label-aware Graph Convolutional Networks for Cross-Modal Retrieval".
Language: Python - Size: 906 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 3

frank-chris/ImageTextRetrieval
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.
Language: Jupyter Notebook - Size: 6.88 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

CLT29/semantic_neighborhoods
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
Language: Python - Size: 3.17 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 6

aimh-lab/visione
An AI-powered interactive video retrieval system
Language: JavaScript - Size: 187 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 2

klean2050/EEG_CrossModal
[ICASSP 2022] EEG - Music Cross Modal Learning
Language: Python - Size: 849 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

Paranioar/DBL
[TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”
Language: Python - Size: 783 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

kaylode/tern
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU
Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

dingyh0626/KDD-Cup-Multimodalities-Recall
KDD Cup 2020
Language: Python - Size: 283 KB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 1

Paranioar/GSSF
[TIP2024] The code of "GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning"
Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

mariyahendriksen/ecir2022_category_to_image_retrieval
This repository contains the code for the paper "Extending CLIP for Category-to-image Retrieval in E-commerce" published at ECIR 2022.
Language: Jupyter Notebook - Size: 189 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

mariyahendriksen/ecir23-object-centric-vs-scene-centric-CMR
This repository contains the code for the paper "Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval: A Reproducibility Study" published at ECIR 2023.
Language: Python - Size: 12.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

BUAADreamer/CCRK
[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning
Language: Python - Size: 644 KB - Last synced at: 13 days ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

gorjanradevski/vsepp_tensorflow
Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.
Language: Python - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 5 - Forks: 0

peixinlei/M2HSE
PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"
Language: Python - Size: 82 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

ranarag/ZSCRGAN
Language: Python - Size: 681 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

gorjanradevski/SMHA
My master thesis: Siamese multi-hop attention for cross-modal retrieval.
Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

penghu-cs/DCHN
Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)
Language: Python - Size: 267 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

PrithivirajDamodaran/WhatTheFood
An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.
Size: 1000 Bytes - Last synced at: about 15 hours ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

Ruggero1912/CroQS-benchmark
CroQS: a Benchmark for Cross-modal Query Suggestion
Language: HTML - Size: 1.13 GB - Last synced at: 2 days ago - Pushed at: 2 days ago - Stars: 3 - Forks: 0

GuanRunwei/VehicleFinder-CTIM
Language: Python - Size: 7.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

penghu-cs/ISVN
Deep Semisupervised Cross-modal Retrieval/Cross-view Recognition (IEEE TCYB 2022, PyTorch Code)
Language: Python - Size: 1.45 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

PreferredAI/sml
Code for the paper "Sentiment-Oriented Metric Learning for Text-to-Image Retrieval", ECIR'21
Language: Python - Size: 958 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

gorjanradevski/cross_modal_full_transfer
PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet
Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

ailab-kyunghee/CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Language: Python - Size: 119 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

AkChen/UDIH
Tensorflow implementation of UDIH
Language: Python - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 1

Yur1G4/as
The "as" keyword in programming languages is commonly used for type conversion and type assertion operations. It allows developers to explicitly convert one data type to another or assert that an interface value holds a specific underlying data type.
Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: 4 days ago - Stars: 1 - Forks: 0

runjtu/vpr-arxiv-daily Fork of Vincentqyw/cv-arxiv-daily
Automatically Update Visual Place Recognition Papers Daily using Github Actions (Update Every 12th hours)
Language: Python - Size: 24.7 MB - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

b7GsWQMA2XDrdR/VNEL
VNEL(Visual Named Entity Linking) is a brand-new task that accepts the pure image and processes entity linking on it, which focus on CBIR, Cross-modal retrieve, and Multimodal fusion.
Size: 2.26 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

SahilC/Cross-Modal-Style
An attempt to transfer sentence to image style.
Language: Python - Size: 27.6 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

frank-chris/Image-Text-Retrieval-Web-App
Flask Web App for ES-654 Machine Learning course project
Language: Python - Size: 135 KB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 1 - Forks: 2

hthoai/image-text-matching
Image-Text Matching Model Zoo
Language: Python - Size: 12.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

JThuge/OCDL
Pytorch implementation of the ICASSP 2025 paper "Object-Centric Discriminative Learning for Text-Based Person Retrieval"
Size: 1.95 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 0 - Forks: 0

serizard/text-3d-retrieval
Research project at AI·Robotics Institute, KIST
Language: Python - Size: 12.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

raydog99/solar
Unified optimal transport framework for cross-modal retrieval
Language: OCaml - Size: 270 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Paranioar/Awesome_Image_Text_Retrieval_Benchmark
The Unified Code of Image-Text Retrieval for Further Exploration.
Language: Python - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

kunjmehta/cross-modal-retrieval-food-ai
Course project for 198:536 at Rutgers University. The project is about cross-modal retrieval of food recipes given the images and recipe ingredients and instructions of the recipe, using the Recipe1M dataset.
Language: Jupyter Notebook - Size: 5.17 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

huycq1712/ViTAA Fork of Jarr0d/ViTAA
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
Language: Python - Size: 68.4 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

LongLong-Jing/XMV
PyTorch implementation for Self-supervised Modal and View Invariant Feature Learning
Size: 7.27 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

sontung/hci-intermodal-reasoning
Fachpraktikum project for Human-computer interaction course
Language: Jupyter Notebook - Size: 6.12 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1
