cross-modal-retrieval | Topic | Ecosyste.ms: Repos

Topic: "cross-modal-retrieval"

jina-ai/clip-as-service

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Language: Python - Size: 27.4 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 12,665 - Forks: 2,078

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Language: Python - Size: 12.2 MB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 969 - Forks: 105

Paranioar/Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

Size: 369 KB - Last synced at: 12 days ago - Pushed at: 5 months ago - Stars: 423 - Forks: 48

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Size: 82.3 MB - Last synced at: about 1 month ago - Pushed at: 5 months ago - Stars: 401 - Forks: 19

layumi/Image-Text-Embedding

TOMM2020 Dual-Path Convolutional Image-Text Embedding with Instance Loss :feet: https://arxiv.org/abs/1711.05535

Language: MATLAB - Size: 6.02 MB - Last synced at: 3 days ago - Pushed at: 4 months ago - Stars: 291 - Forks: 73

Paranioar/SGRAF

[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”

Language: Python - Size: 794 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 197 - Forks: 37

woodfrog/vse_infty

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Language: Python - Size: 3.91 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 136 - Forks: 18

jpthu17/EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Language: Python - Size: 23.9 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 134 - Forks: 10

yalesong/pvse

Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)

Language: Python - Size: 16 MB - Last synced at: about 2 months ago - Pushed at: about 1 year ago - Stars: 134 - Forks: 23

jpthu17/DiffusionRet

[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model

Language: Python - Size: 5.36 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 131 - Forks: 7

penghu-cs/DSCMR

Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)

Language: Python - Size: 10.6 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 131 - Forks: 24

naver-ai/pcme

Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)

Language: Python - Size: 2.11 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 119 - Forks: 17

jpthu17/HBI

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Language: Python - Size: 51 MB - Last synced at: about 4 hours ago - Pushed at: 5 months ago - Stars: 117 - Forks: 5

ilaria-manco/muscall

Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)

Language: Python - Size: 193 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 106 - Forks: 11

howard-hou/BagFormer

PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction

Language: Python - Size: 3.44 MB - Last synced at: 7 days ago - Pushed at: over 2 years ago - Stars: 99 - Forks: 33

naver-ai/eccv-caption

Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)

Language: Python - Size: 771 KB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 56 - Forks: 2

AyanKumarBhunia/on-the-fly-FGSBIR

[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .

Language: Python - Size: 20.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 55 - Forks: 13

jpthu17/DiCoSA

[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment

Language: Python - Size: 5.56 MB - Last synced at: 19 days ago - Pushed at: about 1 year ago - Stars: 51 - Forks: 2

penghu-cs/MRL

Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)

Language: Python - Size: 23.9 MB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 44 - Forks: 10

BrandonHanx/TextReID

[BMVC 2021] Text-Based Person Search with Limited Data

Language: Python - Size: 96.7 KB - Last synced at: 11 months ago - Pushed at: almost 3 years ago - Stars: 42 - Forks: 5

naver-ai/pcmepp

Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)

Language: Python - Size: 15.3 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 39 - Forks: 1

WendellGul/AGAH

Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".

Language: Python - Size: 553 KB - Last synced at: about 1 year ago - Pushed at: over 5 years ago - Stars: 36 - Forks: 11

kyuyeonpooh/objects-that-sound

The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.

Language: Python - Size: 163 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 32 - Forks: 4

360CVGroup/FG-CLIP

New generation of CLIP with fine grained discrimination capability, ICML2025

Language: Python - Size: 5.54 MB - Last synced at: 10 days ago - Pushed at: 10 days ago - Stars: 31 - Forks: 1

aimh-lab/visione

An AI-powered interactive video retrieval system

Language: JavaScript - Size: 187 MB - Last synced at: 16 days ago - Pushed at: 8 months ago - Stars: 31 - Forks: 4

BUAADreamer/SPN4CIR

[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Language: Python - Size: 4.2 MB - Last synced at: about 1 month ago - Pushed at: 7 months ago - Stars: 30 - Forks: 3

mako443/Text2Pos-CVPR2022

Code, dataset and models for our CVPR 2022 publication "Text2Pos"

Language: Python - Size: 450 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 30 - Forks: 3

penghu-cs/SDML

Scalable deep multimodal learning for cross-modal retrieval (SIGIR 2019, PyTorch Code)

Language: Python - Size: 23.5 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 30 - Forks: 13

penghu-cs/UCCH

Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)

Language: Python - Size: 2.56 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 29 - Forks: 8

slavabarkov/tidy

Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine

Language: Kotlin - Size: 99.3 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 27 - Forks: 5

idealwhite/VLDeformer

Pytorch implement of the paper "VLDeformer: Vision Language Decomposed Transformer for Fast Cross-modal Retrieval", KBS 2022

Language: Jupyter Notebook - Size: 2.42 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 27 - Forks: 3

penghu-cs/MAN

Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)

Language: Python - Size: 8.43 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 26 - Forks: 6

mesnico/ALADIN

Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"

Language: Python - Size: 17.6 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 24 - Forks: 6

LivXue/GNN4CMR

PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval".

Language: Python - Size: 596 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 24 - Forks: 3

Paranioar/RCAR

[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”

Language: Python - Size: 1.72 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 21 - Forks: 2

MartinYuanNJU/SEMScene

Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).

Language: Python - Size: 36.6 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 20 - Forks: 0

jaychempan/SWAN-pytorch

Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval (ICMR'23 Oral)

Language: Python - Size: 2.13 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 4

xiaoyuan1996/SemanticLocalizationMetrics

The first research for semantic localization

Language: Python - Size: 41.3 MB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 19 - Forks: 4

peri044/STT

A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.

Language: Python - Size: 103 KB - Last synced at: 13 days ago - Pushed at: over 5 years ago - Stars: 18 - Forks: 5

ict-bigdatalab/VNEL

Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"

Size: 4.91 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 17 - Forks: 0

haomo-ai/ModaLink

[IROS 2024] This repository contains the implementation of our paper: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Language: Python - Size: 38.1 KB - Last synced at: 4 months ago - Pushed at: 8 months ago - Stars: 15 - Forks: 0

penghu-cs/MvLDAN

Multi-view Linear Discriminant Analysis Network for Cross-modal Retrieval and Cross-view Recognition (Keras&Theano Code)

Language: Python - Size: 38.5 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 14 - Forks: 5

zzezze/NeighborRetr

Official implementation of "NeighborRetr: Balancing Hub Centrality in Cross-Modal Retrieval (CVPR 2025)"

Language: Python - Size: 4.71 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 12 - Forks: 1

alipay/PC2-NoiseofWeb

Noise of Web (NoW) is a challenging noisy correspondence learning (NCL) benchmark containing 100K image-text pairs for robust image-text matching/retrieval models.

Language: Python - Size: 13.6 MB - Last synced at: 24 days ago - Pushed at: 6 months ago - Stars: 12 - Forks: 1

ivonajdenkoska/tulip

[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"

Language: Python - Size: 27.3 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

zhouyu1996/DAQN

An implement of our paper “DEEP ADVERSARIAL QUANTIZATION NETWORK FOR CROSS-MODAL RETRIEVAL”

Language: Python - Size: 42 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 10 - Forks: 3

LivXue/ALGCN

This repository contains the author's implementation in PyTorch for the paper "Adaptive Label-aware Graph Convolutional Networks for Cross-Modal Retrieval".

Language: Python - Size: 906 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 3

frank-chris/ImageTextRetrieval

In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.

Language: Jupyter Notebook - Size: 6.88 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 9 - Forks: 2

CLT29/semantic_neighborhoods

Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]

Language: Python - Size: 3.17 MB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 9 - Forks: 6

klean2050/EEG_CrossModal

[ICASSP 2022] EEG - Music Cross Modal Learning

Language: Python - Size: 849 KB - Last synced at: almost 2 years ago - Pushed at: about 3 years ago - Stars: 7 - Forks: 1

Paranioar/DBL

[TIP2024] The code of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”

Language: Python - Size: 783 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 6 - Forks: 0

kaylode/tern

Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

Language: Jupyter Notebook - Size: 7.23 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 6 - Forks: 1

dingyh0626/KDD-Cup-Multimodalities-Recall

KDD Cup 2020

Language: Python - Size: 283 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 6 - Forks: 1

Paranioar/GSSF

[TIP2024] The code of "GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning"

Size: 5.86 KB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 5 - Forks: 0

mariyahendriksen/ecir2022_category_to_image_retrieval

This repository contains the code for the paper "Extending CLIP for Category-to-image Retrieval in E-commerce" published at ECIR 2022.

Language: Jupyter Notebook - Size: 189 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 1

mariyahendriksen/ecir23-object-centric-vs-scene-centric-CMR

This repository contains the code for the paper "Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval: A Reproducibility Study" published at ECIR 2023.

Language: Python - Size: 12.3 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 5 - Forks: 0

BUAADreamer/CCRK

[KDD 2024] Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

Language: Python - Size: 644 KB - Last synced at: about 1 month ago - Pushed at: 10 months ago - Stars: 5 - Forks: 0

gorjanradevski/vsepp_tensorflow

Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.

Language: Python - Size: 49.8 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

peixinlei/M2HSE

PyTorch code for the paper "Complementarity is the king: A multi-modal and multi-grained hierarchical semantic enhancement network for cross-modal retrieval"

Language: Python - Size: 82 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

ranarag/ZSCRGAN

Language: Python - Size: 681 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 1

gorjanradevski/SMHA

My master thesis: Siamese multi-hop attention for cross-modal retrieval.

Language: Python - Size: 2.76 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 5 - Forks: 0

penghu-cs/DCHN

Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)

Language: Python - Size: 267 KB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

PrithivirajDamodaran/WhatTheFood

An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.

Size: 1000 Bytes - Last synced at: 7 days ago - Pushed at: over 3 years ago - Stars: 4 - Forks: 0

runjtu/vpr-arxiv-daily Fork of Vincentqyw/cv-arxiv-daily

Automatically Update Visual Place Recognition Papers Daily using Github Actions (Update Every 12 hours). VPR is difficult and adapt to the times, if u not read related NEW paper as exhausitive as u can, u'll be challenged by reviewers.

Language: Python - Size: 24.9 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 3 - Forks: 0

Ruggero1912/CroQS-benchmark

CroQS: a Benchmark for Cross-modal Query Suggestion

Language: HTML - Size: 1.13 GB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 3 - Forks: 0

GuanRunwei/VehicleFinder-CTIM

Language: Python - Size: 7.13 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

penghu-cs/ISVN

Deep Semisupervised Cross-modal Retrieval/Cross-view Recognition (IEEE TCYB 2022, PyTorch Code)

Language: Python - Size: 1.45 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 3 - Forks: 0

PreferredAI/sml

Code for the paper "Sentiment-Oriented Metric Learning for Text-to-Image Retrieval", ECIR'21

Language: Python - Size: 958 KB - Last synced at: about 1 year ago - Pushed at: over 3 years ago - Stars: 3 - Forks: 0

gorjanradevski/cross_modal_full_transfer

PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet

Language: Python - Size: 72.3 KB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 3 - Forks: 1

JThuge/OCDL

Pytorch implementation of the ICASSP 2025 paper "Object-Centric Discriminative Learning for Text-Based Person Retrieval"

Language: Jupyter Notebook - Size: 37 MB - Last synced at: 12 days ago - Pushed at: 12 days ago - Stars: 2 - Forks: 0

ailab-kyunghee/CM2_DVC

[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval

Language: Python - Size: 119 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 2 - Forks: 0

AkChen/UDIH

Tensorflow implementation of UDIH

Language: Python - Size: 39.1 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 1

Yur1G4/as

The "as" keyword in programming languages is commonly used for type conversion and type assertion operations. It allows developers to explicitly convert one data type to another or assert that an interface value holds a specific underlying data type.

Size: 1000 Bytes - Last synced at: 6 days ago - Pushed at: 6 days ago - Stars: 1 - Forks: 0

b7GsWQMA2XDrdR/VNEL

VNEL(Visual Named Entity Linking) is a brand-new task that accepts the pure image and processes entity linking on it, which focus on CBIR, Cross-modal retrieve, and Multimodal fusion.

Size: 2.26 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

SahilC/Cross-Modal-Style

An attempt to transfer sentence to image style.

Language: Python - Size: 27.6 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

frank-chris/Image-Text-Retrieval-Web-App

Flask Web App for ES-654 Machine Learning course project

Language: Python - Size: 135 KB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

hthoai/image-text-matching

Image-Text Matching Model Zoo

Language: Python - Size: 12.7 MB - Last synced at: about 2 years ago - Pushed at: about 4 years ago - Stars: 1 - Forks: 2

serizard/text-3d-retrieval

Research project at AI·Robotics Institute, KIST

Language: Python - Size: 12.2 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 0 - Forks: 0

raydog99/solar

Unified optimal transport framework for cross-modal retrieval

Language: OCaml - Size: 270 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 0 - Forks: 0

Paranioar/Awesome_Image_Text_Retrieval_Benchmark

The Unified Code of Image-Text Retrieval for Further Exploration.

Language: Python - Size: 41 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

kunjmehta/cross-modal-retrieval-food-ai

Course project for 198:536 at Rutgers University. The project is about cross-modal retrieval of food recipes given the images and recipe ingredients and instructions of the recipe, using the Recipe1M dataset.

Language: Jupyter Notebook - Size: 5.17 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 0 - Forks: 1

huycq1712/ViTAA Fork of Jarr0d/ViTAA

ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language

Language: Python - Size: 68.4 KB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

LongLong-Jing/XMV

PyTorch implementation for Self-supervised Modal and View Invariant Feature Learning

Size: 7.27 MB - Last synced at: almost 2 years ago - Pushed at: almost 5 years ago - Stars: 0 - Forks: 0

sontung/hci-intermodal-reasoning

Fachpraktikum project for Human-computer interaction course

Language: Jupyter Notebook - Size: 6.12 MB - Last synced at: about 2 years ago - Pushed at: about 5 years ago - Stars: 0 - Forks: 1