video-text-retrieval | Topic | Ecosyste.ms: Repos

Topic: "video-text-retrieval"

ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Language: Python - Size: 1.61 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

Paranioar/Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

Size: 369 KB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 425 - Forks: 48

microsoft/UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

Language: Python - Size: 219 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 351 - Forks: 56

whwu95/Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Language: Python - Size: 8.58 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 240 - Forks: 20

salesforce/ALPRO 📦

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Language: Python - Size: 311 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 187 - Forks: 17

m-bain/CondensedMovies

Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]

Language: Python - Size: 22 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 175 - Forks: 28

alipay/Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

Language: Python - Size: 17 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 143 - Forks: 5

amazon-science/crossmodal-contrastive-learning

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

Language: Python - Size: 766 KB - Last synced at: 1 day ago - Pushed at: about 3 years ago - Stars: 63 - Forks: 11

xuguohai/X-CLIP

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"

Language: Python - Size: 1.57 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 9

RenShuhuai-Andy/TESTA

[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

Language: Python - Size: 835 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 3

LeapLabTHU/Cross-Modal-Adapter

[arXiv] Cross-Modal Adapter for Text-Video Retrieval

Size: 3.39 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 2

shufangxun/MAC

An end-to-end masked contrastive video-and-language pre-training framework

Size: 1.96 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 0

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

rn-snehapriya/Automatic-Note-Taking-From-Video-Using-Tesseract-OCR

Text from the video is extracted and saved into a .docx file in the form of notes.

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1

unitaryai/VTC-dataset

Language: Python - Size: 38.1 KB - Last synced at: 8 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

Related Topics

msrvtt 4 vision-and-language 4 video-understanding 4 activitynet 3 didemo 3 multimodal 3 clip 3 video-language 3 pytorch 2 contrastive-learning 2 pretraining 2 dataset 2 vision-language-pretraining 2 image-text-retrieval 2 retrieval 2 multimodality 2 multimodal-learning 2 video 2 lsmdc 2 msvd 2 machine-learning 2 video-text-recognition 2 video-text 2 coin 1 video-qa 1 long-video-understanding 1 vision-language-transformer 1 multimodal-deep-learning 1 comments 1 video-clip-retrieval 1 search 1 retrieval-model 1 ranking 1 youcookii 1 video-editing 1 multimodal-llm 1 video-captioning 1 joint 1 localization 1 caption-task 1 caption 1 multimodal-sentiment-analysis 1 pretrain 1 alignment 1 video-language-understanding 1 retrieval-task 1 cross-modal-learning 1 source-videos 1 precomputed-features 1 segmentation 1 video-question-answering 1 representation-learning 1 prompt-learning 1 large-language-model 1 image-text-matching 1 cross-modal-retrieval 1 awesome-list 1 vision-language-dataset 1 vision-transformer 1 mae 1 end-to-end-learning 1 parameter-efficient-tuning 1 parameter-efficient-learning 1 deep-learning 1 adapter 1 video-to-text 1 tesseract-ocr 1 ocr 1 automatic-note-taking 1 transformers 1 natural-language-processing 1 multi-modality 1 computer-vision 1 visual-semantic-embedding 1 tutorial 1 text-to-video-generation 1 text-to-image-synthesis 1 text-to-image-generation 1 parameter-efficient-fine-tuning 1 multimodal-pretraining 1 multimodal-large-language-models 1 memory-efficient-tuning 1 large-vision-models 1 large-vision-language-models 1 large-language-models 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos