Topic: "video-text-retrieval"
ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language: Python - Size: 1.61 MB - Last synced at: about 1 month ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Size: 369 KB - Last synced at: 10 days ago - Pushed at: 5 months ago - Stars: 425 - Forks: 48

microsoft/UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Language: Python - Size: 219 KB - Last synced at: 4 days ago - Pushed at: 9 months ago - Stars: 351 - Forks: 56

whwu95/Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Language: Python - Size: 8.58 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 240 - Forks: 20

salesforce/ALPRO 📦
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Language: Python - Size: 311 KB - Last synced at: 3 days ago - Pushed at: 3 days ago - Stars: 187 - Forks: 17

m-bain/CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Language: Python - Size: 22 MB - Last synced at: 27 days ago - Pushed at: over 2 years ago - Stars: 175 - Forks: 28

alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language: Python - Size: 17 MB - Last synced at: 9 days ago - Pushed at: 10 months ago - Stars: 143 - Forks: 5

amazon-science/crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Language: Python - Size: 766 KB - Last synced at: 1 day ago - Pushed at: about 3 years ago - Stars: 63 - Forks: 11

xuguohai/X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Language: Python - Size: 1.57 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 9

RenShuhuai-Andy/TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Language: Python - Size: 835 KB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 3

LeapLabTHU/Cross-Modal-Adapter
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
Size: 3.39 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 2

shufangxun/MAC
An end-to-end masked contrastive video-and-language pre-training framework
Size: 1.96 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 0

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 0

rn-snehapriya/Automatic-Note-Taking-From-Video-Using-Tesseract-OCR
Text from the video is extracted and saved into a .docx file in the form of notes.
Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1

unitaryai/VTC-dataset
Language: Python - Size: 38.1 KB - Last synced at: 8 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0
