GitHub topics: cross-modal-retrieval

Repositories

runjtu/vpr-arxiv-daily Fork of Vincentqyw/cv-arxiv-daily

Automatically Update Visual Place Recognition Papers Daily using Github Actions (Update Every 12th hours)

Language: Python - Size: 24.7 MB - Last synced at: about 16 hours ago - Pushed at: 1 day ago - Stars: 1 - Forks: 0

Yur1G4/as

The "as" keyword in programming languages is commonly used for type conversion and type assertion operations. It allows developers to explicitly convert one data type to another or assert that an interface value holds a specific underlying data type.

Size: 1000 Bytes - Last synced at: 5 days ago - Pushed at: 5 days ago - Stars: 1 - Forks: 0

howard-hou/BagFormer

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Language: Python - Size: 3.44 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 99 - Forks: 33

zzezze/NeighborRetr

Official implementation of "NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval (CVPR 2025)"

Language: Python - Size: 4.71 MB - Last synced at: 6 days ago - Pushed at: 7 days ago - Stars: 12 - Forks: 1

jina-ai/clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Language: Python - Size: 27.4 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 12,630 - Forks: 2,075

Paranioar/Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

Size: 369 KB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 425 - Forks: 48

Ruggero1912/CroQS-benchmark

CroQS: a Benchmark for Cross-modal Query Suggestion

Language: HTML - Size: 1.13 GB - Last synced at: 9 days ago - Pushed at: 9 days ago - Stars: 3 - Forks: 0

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Size: 82.3 MB - Last synced at: 8 days ago - Pushed at: 4 months ago - Stars: 401 - Forks: 19

layumi/Image-Text-Embedding

TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535

Language: MATLAB - Size: 6.02 MB - Last synced at: 8 days ago - Pushed at: 3 months ago - Stars: 290 - Forks: 73

naver-ai/eccv-caption

Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)

Language: Python - Size: 771 KB - Last synced at: 2 days ago - Pushed at: about 1 year ago - Stars: 56 - Forks: 2

jpthu17/HBI

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Language: Python - Size: 51 MB - Last synced at: 14 days ago - Pushed at: 4 months ago - Stars: 116 - Forks: 5

jpthu17/EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Language: Python - Size: 23.9 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 132 - Forks: 9

jpthu17/DiffusionRet

[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Language: Python - Size: 5.36 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 129 - Forks: 7

mesnico/ALADIN

Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"

Language: Python - Size: 17.6 MB - Last synced at: 11 days ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 6

BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Language: Python - Size: 4.2 MB - Last synced at: 7 days ago - Pushed at: 6 months ago - Stars: 30 - Forks: 3

YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Language: Python - Size: 12.2 MB - Last synced at: 7 days ago - Pushed at: about 2 years ago - Stars: 970 - Forks: 105

jpthu17/DiCoSA

[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

Language: Python - Size: 5.56 MB - Last synced at: 21 days ago - Pushed at: about 1 year ago - Stars: 51 - Forks: 2

ivonajdenkoska/tulip

[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"

Language: Python - Size: 27.3 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 11 - Forks: 0

JThuge/OCDL

Pytorch implementation of the ICASSP 2025 paper "Object-Centric Discriminative Learning for Text-Based Person Retrieval"

Size: 1.95 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 0 - Forks: 0

Paranioar/GSSF

[TIP2024] The code of "GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning"

Size: 5.86 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 5 - Forks: 0

ilaria-manco/muscall

Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)

Language: Python - Size: 193 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 106 - Forks: 11

PrithivirajDamodaran/WhatTheFood

An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.

Size: 1000 Bytes - Last synced at: 4 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

alipay/PC2-NoiseofWeb

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

Language: Python - Size: 13.6 MB - Last synced at: 16 days ago - Pushed at: 5 months ago - Stars: 12 - Forks: 1

mariyahendriksen/ecir2022_category_to_image_retrieval

This repository contains the code for the paper "Extending CLIP for Category-to-image Retrieval in E-commerce" published at ECIR 2022.

Language: Jupyter Notebook - Size: 189 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 1

mariyahendriksen/ecir23-object-centric-vs-scene-centric-CMR

This repository contains the code for the paper "Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval: A Reproducibility Study" published at ECIR 2023.

Language: Python - Size: 12.3 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

haomo-ai/ModaLink

[IROS 2024] This repository contains the implementation of our paper: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Language: Python - Size: 38.1 KB - Last synced at: 3 months ago - Pushed at: 7 months ago - Stars: 15 - Forks: 0

yalesong/pvse

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)

Language: Python - Size: 16 MB - Last synced at: 18 days ago - Pushed at: about 1 year ago - Stars: 134 - Forks: 23

BUAADreamer/CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Language: Python - Size: 644 KB - Last synced at: 7 days ago - Pushed at: 9 months ago - Stars: 5 - Forks: 0

MartinYuanNJU/SEMScene

Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).

Language: Python - Size: 36.6 MB - Last synced at: 8 months ago - Pushed at: 8 months ago - Stars: 20 - Forks: 0

serizard/text-3d-retrieval

Research project at AI·Robotics Institute, KIST

Language: Python - Size: 12.2 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 0 - Forks: 0

peixinlei/M2HSE

PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"

Language: Python - Size: 82 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

raydog99/solar

Unified optimal transport framework for cross-modal retrieval

Language: OCaml - Size: 270 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

Paranioar/DBL

[TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”

Language: Python - Size: 783 KB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 6 - Forks: 0

aimh-lab/visione

An AI-powered interactive video retrieval system

Language: JavaScript - Size: 187 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 7 - Forks: 2

naver-ai/pcmepp

Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)

Language: Python - Size: 15.3 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 1

naver-ai/pcme

Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)

Language: Python - Size: 2.11 MB - Last synced at: 11 months ago - Pushed at: about 1 year ago - Stars: 119 - Forks: 17

kunjmehta/cross-modal-retrieval-food-ai

Course project for 198:536 at Rutgers University. The project is about cross-modal retrieval of food recipes given the images and recipe ingredients and instructions of the recipe, using the Recipe1M dataset.

Language: Jupyter Notebook - Size: 5.17 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 1

ailab-kyunghee/CM2_DVC

[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Language: Python - Size: 119 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

Paranioar/Awesome_Image_Text_Retrieval_Benchmark

The Unified Code of Image-Text Retrieval for Further Exploration.

Language: Python - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Paranioar/SGRAF

[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

Language: Python - Size: 794 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 197 - Forks: 37

Paranioar/RCAR

[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

Language: Python - Size: 1.72 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 2

BrandonHanx/TextReID

[BMVC 2021] Text-Based Person Search with Limited Data

Language: Python - Size: 96.7 KB - Last synced at: 10 months ago - Pushed at: over 2 years ago - Stars: 42 - Forks: 5

peri044/STT

A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.

Language: Python - Size: 103 KB - Last synced at: 20 days ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 5

kyuyeonpooh/objects-that-sound

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Language: Python - Size: 163 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 4

penghu-cs/UCCH

Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)

Language: Python - Size: 2.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 29 - Forks: 8

jaychempan/SWAN-pytorch

Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval (ICMR'23 Oral)

Language: Python - Size: 2.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 4

xiaoyuan1996/SemanticLocalizationMetrics

The first research for semantic localization

Language: Python - Size: 41.3 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 4

penghu-cs/DSCMR

Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)

Language: Python - Size: 10.6 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 131 - Forks: 24

slavabarkov/tidy

Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

Language: Kotlin - Size: 99.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 5

woodfrog/vse_infty

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Language: Python - Size: 3.91 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 136 - Forks: 18

gorjanradevski/SMHA

My master thesis: Siamese multi-hop attention for cross-modal retrieval.

Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

CLT29/semantic_neighborhoods

Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]

Language: Python - Size: 3.17 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 6

WendellGul/AGAH

Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".

Language: Python - Size: 553 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 36 - Forks: 11

penghu-cs/MvLDAN

Multi-view Linear Discriminant Analysis Network for Cross-modal Retrieval and Cross-view Recognition (Keras&Theano Code)

Language: Python - Size: 38.5 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

huycq1712/ViTAA Fork of Jarr0d/ViTAA

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Language: Python - Size: 68.4 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

penghu-cs/MRL

Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

Language: Python - Size: 23.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 10

LivXue/GNN4CMR

PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval".

Language: Python - Size: 596 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 3

GuanRunwei/VehicleFinder-CTIM

Language: Python - Size: 7.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 3 - Forks: 0

LivXue/ALGCN

This repository contains the author's implementation in PyTorch for the paper "Adaptive Label-aware Graph Convolutional Networks for Cross-Modal Retrieval".

Language: Python - Size: 906 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 3

klean2050/EEG_CrossModal

[ICASSP 2022] EEG - Music Cross Modal Learning

Language: Python - Size: 849 KB - Last synced at: over 1 year ago - Pushed at: almost 3 years ago - Stars: 7 - Forks: 1

ict-bigdatalab/VNEL

Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"

Size: 4.91 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 0

frank-chris/ImageTextRetrieval

In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.

Language: Jupyter Notebook - Size: 6.88 MB - Last synced at: almost 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2