Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: cross-modal-retrieval
MartinYuanNJU/SEMScene
Code implementation of paper "SEMScene: Semantic-Consistency Enhanced Multi-Level Scene Graph Matching for Image-Text Retrieval" (ACM TOMM 2024).
Language: Python - Size: 36.6 MB - Last synced: 2 days ago - Pushed: 3 days ago - Stars: 14 - Forks: 0
zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Size: 82.2 MB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 204 - Forks: 13
Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model, Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Size: 215 KB - Last synced: about 13 hours ago - Pushed: 2 months ago - Stars: 354 - Forks: 46
jpthu17/DiCoSA
[IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Language: Python - Size: 5.56 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 42 - Forks: 2
jpthu17/EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Language: Python - Size: 23.9 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 98 - Forks: 7
naver-ai/pcmepp
Official Pytorch implementation of "Improved Probabilistic Image-Text Representations" (ICLR 2024)
Language: Python - Size: 15.3 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 39 - Forks: 1
jpthu17/DiffusionRet
[ICCV 2023] DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Language: Python - Size: 5.36 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 99 - Forks: 4
naver-ai/pcme
Official Pytorch implementation of "Probabilistic Cross-Modal Embedding" (CVPR 2021)
Language: Python - Size: 2.11 MB - Last synced: 6 days ago - Pushed: 3 months ago - Stars: 119 - Forks: 17
jpthu17/HBI
[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Language: Python - Size: 51 MB - Last synced: 6 days ago - Pushed: about 1 month ago - Stars: 94 - Forks: 4
kunjmehta/cross-modal-retrieval-food-ai
Course project for 198:536 at Rutgers University. The project is about cross-modal retrieval of food recipes given the images and recipe ingredients and instructions of the recipe, using the Recipe1M dataset.
Language: Jupyter Notebook - Size: 5.17 MB - Last synced: 29 days ago - Pushed: over 1 year ago - Stars: 0 - Forks: 1
ailab-kyunghee/CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Language: Python - Size: 119 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 2 - Forks: 0
naver-ai/eccv-caption
Extended COCO Validation (ECCV) Caption dataset (ECCV 2022)
Language: Python - Size: 771 KB - Last synced: 6 days ago - Pushed: 3 months ago - Stars: 51 - Forks: 2
Paranioar/Awesome_Image_Text_Retrieval_Benchmark
The Unified Code of Image-Text Retrieval for Further Exploration.
Language: Python - Size: 41 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0
Paranioar/SGRAF
[AAAI2021] The code of “Similarity Reasoning and Filtration for Image-Text Matching”
Language: Python - Size: 794 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 197 - Forks: 37
Paranioar/RCAR
[TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”
Language: Python - Size: 1.72 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 21 - Forks: 2
jina-ai/clip-as-service
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Language: Python - Size: 27.4 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 12,060 - Forks: 2,055
layumi/Image-Text-Embedding
TOMM2020 Dual-Path Convolutional Image-Text Embedding :feet: https://arxiv.org/abs/1711.05535
Language: MATLAB - Size: 6.02 MB - Last synced: 8 days ago - Pushed: 11 months ago - Stars: 280 - Forks: 73
aimh-lab/visione
An AI-powered interactive video retrieval system
Language: JavaScript - Size: 187 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 6 - Forks: 0
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language: Python - Size: 12.2 MB - Last synced: 3 months ago - Pushed: about 1 year ago - Stars: 996 - Forks: 124
kyuyeonpooh/objects-that-sound
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
Language: Python - Size: 163 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 32 - Forks: 4
yalesong/pvse
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval (CVPR 2019)
Language: Python - Size: 15.9 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 128 - Forks: 24
mariyahendriksen/ecir2022_category_to_image_retrieval
This repository contains the code for the paper "Extending CLIP for Category-to-image Retrieval in E-commerce" published at ECIR 2022.
Language: Jupyter Notebook - Size: 169 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 3 - Forks: 0
penghu-cs/UCCH
Unsupervised Contrastive Cross-modal Hashing (IEEE TPAMI 2023, PyTorch Code)
Language: Python - Size: 2.56 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 29 - Forks: 8
jaychempan/SWAN-pytorch
Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval (ICMR'23 Oral)
Language: Python - Size: 2.13 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 19 - Forks: 4
xiaoyuan1996/SemanticLocalizationMetrics
The first research for semantic localization
Language: Python - Size: 41.3 MB - Last synced: 27 days ago - Pushed: 6 months ago - Stars: 19 - Forks: 4
penghu-cs/DSCMR
Deep Supervised Cross-modal Retrieval (CVPR 2019, PyTorch Code)
Language: Python - Size: 10.6 MB - Last synced: 6 months ago - Pushed: over 4 years ago - Stars: 131 - Forks: 24
howard-hou/BagFormer
PyTorch code for BagFormer: Better Cross-Modal Retrieval via bag-wise interaction
Language: Python - Size: 3.44 MB - Last synced: 6 months ago - Pushed: over 1 year ago - Stars: 114 - Forks: 33
mesnico/ALADIN
Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"
Language: Python - Size: 17.6 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 16 - Forks: 6
slavabarkov/tidy
Offline semantic Text-to-Image and Image-to-Image search on Android powered by quantized state-of-the-art vision-language pretrained CLIP model and ONNX Runtime inference engine
Language: Kotlin - Size: 99.3 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 27 - Forks: 5
woodfrog/vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021
Language: Python - Size: 3.91 MB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 136 - Forks: 18
gorjanradevski/SMHA
My master thesis: Siamese multi-hop attention for cross-modal retrieval.
Language: Python - Size: 2.76 MB - Last synced: 7 months ago - Pushed: about 4 years ago - Stars: 5 - Forks: 0
peri044/STT
A multi-task model which does image captioning, sentence paraphrasing and cross-modal retrieval.
Language: Python - Size: 103 KB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 18 - Forks: 6
CLT29/semantic_neighborhoods
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval [ECCV 2020]
Language: Python - Size: 3.17 MB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 9 - Forks: 6
WendellGul/AGAH
Source code for paper "Adversary Guided Asymmetric Hashing for Cross-Modal Retrieval".
Language: Python - Size: 553 KB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 36 - Forks: 11
mariyahendriksen/ecir23-object-centric-vs-scene-centric-CMR
This repository contains the code for the paper "Object-centric vs. Scene-centric Image-Text Cross-modal Retrieval: A Reproducibility Study" published at ECIR 2023.
Language: Python - Size: 12.3 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 4 - Forks: 0
penghu-cs/MvLDAN
Multi-view Linear Discriminant Analysis Network for Cross-modal Retrieval and Cross-view Recognition (Keras&Theano Code)
Language: Python - Size: 38.5 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 14 - Forks: 5
huycq1712/ViTAA Fork of Jarr0d/ViTAA
ViTAA: Visual-Textual Attributes Alignment in Person Search by Natural Language
Language: Python - Size: 68.4 KB - Last synced: 8 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0
penghu-cs/MRL
Learning Cross-Modal Retrieval with Noisy Labels (CVPR 2021, PyTorch Code)
Language: Python - Size: 23.9 MB - Last synced: 7 months ago - Pushed: about 1 year ago - Stars: 44 - Forks: 10
LivXue/GNN4CMR
PyTorch implementation of the AAAI-21 paper "Dual Adversarial Label-aware Graph Neural Networks for Cross-modal Retrieval" and the TPAMI-22 paper "Integrating Multi-Label Contrastive Learning with Dual Adversarial Graph Neural Networks for Cross-Modal Retrieval".
Language: Python - Size: 596 KB - Last synced: 8 months ago - Pushed: over 1 year ago - Stars: 24 - Forks: 3
GuanRunwei/VehicleFinder-CTIM
Language: Python - Size: 7.13 MB - Last synced: 9 months ago - Pushed: 10 months ago - Stars: 3 - Forks: 0
LivXue/ALGCN
This repository contains the author's implementation in PyTorch for the paper "Adaptive Label-aware Graph Convolutional Networks for Cross-Modal Retrieval".
Language: Python - Size: 906 KB - Last synced: 8 months ago - Pushed: over 2 years ago - Stars: 9 - Forks: 3
klean2050/EEG_CrossModal
[ICASSP 2022] EEG - Music Cross Modal Learning
Language: Python - Size: 849 KB - Last synced: 10 months ago - Pushed: about 2 years ago - Stars: 7 - Forks: 1
ict-bigdatalab/VNEL
Dataset and code for EMNLP 2022 "Visual Named Entity Linking: A New Dataset and A Baseline"
Size: 4.91 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 17 - Forks: 0
frank-chris/ImageTextRetrieval
In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Projection Learning model and study their performance. We also propose a modified Deep Cross-Modal Projection Learning model that uses a different image feature extractor. We evaluate the model’s performance on image-text retrieval on a fashion clothing dataset.
Language: Jupyter Notebook - Size: 6.88 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 9 - Forks: 2
PreferredAI/sml
Code for the paper "Sentiment-Oriented Metric Learning for Text-to-Image Retrieval", ECIR'21
Language: Python - Size: 958 KB - Last synced: 22 days ago - Pushed: over 2 years ago - Stars: 3 - Forks: 0
mako443/Text2Pos-CVPR2022
Code, dataset and models for our CVPR 2022 publication "Text2Pos"
Language: Python - Size: 450 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 30 - Forks: 3
BrandonHanx/TextReID
[BMVC 2021] Text-Based Person Search with Limited Data
Language: Python - Size: 96.7 KB - Last synced: about 1 year ago - Pushed: almost 2 years ago - Stars: 37 - Forks: 7
ilaria-manco/muscall
Official implementation of "Contrastive Audio-Language Learning for Music" (ISMIR 2022)
Language: Python - Size: 193 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 63 - Forks: 5
AyanKumarBhunia/on-the-fly-FGSBIR
[CVPR 2020, Oral] "Sketch Less for More: On-the-Fly Fine-Grained Sketch Based Image Retrieval”, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020. .
Language: Python - Size: 20.7 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 55 - Forks: 13
kaylode/tern
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU
Language: Jupyter Notebook - Size: 7.23 MB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 6 - Forks: 1
zhouyu1996/DAQN
An implement of our paper “DEEP ADVERSARIAL QUANTIZATION NETWORK FOR CROSS-MODAL RETRIEVAL”
Language: Python - Size: 42 KB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 10 - Forks: 3
AkChen/UDIH
Tensorflow implementation of UDIH
Language: Python - Size: 39.1 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 2 - Forks: 1
penghu-cs/MAN
Multimodal Adversarial Network for Cross-modal Retrieval (PyTorch Code)
Language: Python - Size: 8.43 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 26 - Forks: 6
penghu-cs/SDML
Scalable deep multimodal learning for cross-modal retrieval (SIGIR 2019, PyTorch Code)
Language: Python - Size: 23.5 MB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 30 - Forks: 13
penghu-cs/DCHN
Joint Versus Independent Multiview Hashing for Cross-View Retrieval[J] (IEEE TCYB 2021, PyTorch Code)
Language: Python - Size: 267 KB - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 4 - Forks: 0
penghu-cs/ISVN
Deep Semisupervised Cross-modal Retrieval/Cross-view Recognition (IEEE TCYB 2022, PyTorch Code)
Language: Python - Size: 1.45 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 3 - Forks: 0
gorjanradevski/vsepp_tensorflow
Implementation of "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives" in Tensorflow.
Language: Python - Size: 49.8 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 5 - Forks: 0
idealwhite/VLDeformer
Pytorch implement of the paper "VLDeformer: Vision Language Decomposed Transformer for Fast Cross-modal Retrieval", KBS 2022
Language: Jupyter Notebook - Size: 2.42 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 27 - Forks: 3
SahilC/Cross-Modal-Style
An attempt to transfer sentence to image style.
Language: Python - Size: 27.6 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
PrithivirajDamodaran/WhatTheFood
An intentionally simple Image to Food cross-modal search. Created by Prithiviraj Damodaran.
Size: 1000 Bytes - Last synced: about 1 year ago - Pushed: over 2 years ago - Stars: 2 - Forks: 0
b7GsWQMA2XDrdR/VNEL
VNEL(Visual Named Entity Linking) is a brand-new task that accepts the pure image and processes entity linking on it, which focus on CBIR, Cross-modal retrieve, and Multimodal fusion.
Size: 2.26 MB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 1 - Forks: 0
gorjanradevski/cross_modal_full_transfer
PyTorch code for cross-modal-retrieval on Flickr8k/30k using Bert and EfficientNet
Language: Python - Size: 72.3 KB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 3 - Forks: 1
LongLong-Jing/XMV
PyTorch implementation for Self-supervised Modal and View Invariant Feature Learning
Size: 7.27 MB - Last synced: 12 months ago - Pushed: almost 4 years ago - Stars: 0 - Forks: 0
dingyh0626/KDD-Cup-Multimodalities-Recall
KDD Cup 2020
Language: Python - Size: 283 KB - Last synced: about 1 year ago - Pushed: almost 4 years ago - Stars: 6 - Forks: 1
hthoai/image-text-matching
Image-Text Matching Model Zoo
Language: Python - Size: 12.7 MB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 1 - Forks: 2
frank-chris/Image-Text-Retrieval-Web-App
Flask Web App for ES-654 Machine Learning course project
Language: Python - Size: 135 KB - Last synced: about 1 year ago - Pushed: about 3 years ago - Stars: 1 - Forks: 2
ranarag/ZSCRGAN
Language: Python - Size: 681 KB - Last synced: about 1 year ago - Pushed: over 1 year ago - Stars: 5 - Forks: 1
sontung/hci-intermodal-reasoning
Fachpraktikum project for Human-computer interaction course
Language: Jupyter Notebook - Size: 6.12 MB - Last synced: about 1 year ago - Pushed: about 4 years ago - Stars: 0 - Forks: 1