GitHub topics: video-text-retrieval
microsoft/UniVL
An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"
Language: Python - Size: 219 KB - Last synced at: about 9 hours ago - Pushed at: 9 months ago - Stars: 351 - Forks: 56

Paranioar/Awesome_Matching_Pretraining_Transfering
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
Size: 369 KB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 425 - Forks: 48

ArrowLuo/CLIP4Clip
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
Language: Python - Size: 1.61 MB - Last synced at: 17 days ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

salesforce/ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Language: Python - Size: 310 KB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 186 - Forks: 17

alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
Language: Python - Size: 17 MB - Last synced at: 24 days ago - Pushed at: 9 months ago - Stars: 138 - Forks: 5

m-bain/CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]
Language: Python - Size: 22 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 175 - Forks: 28

amazon-science/crossmodal-contrastive-learning
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Language: Python - Size: 766 KB - Last synced at: 13 days ago - Pushed at: about 3 years ago - Stars: 62 - Forks: 11

whwu95/Cap4Video
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Language: Python - Size: 8.58 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 240 - Forks: 20

unitaryai/VTC
VTC: Improving Video-Text Retrieval with User Comments
Language: Python - Size: 5.45 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 11 - Forks: 0

RenShuhuai-Andy/TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Language: Python - Size: 835 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 3

unitaryai/VTC-dataset
Language: Python - Size: 38.1 KB - Last synced at: 8 months ago - Pushed at: 12 months ago - Stars: 0 - Forks: 0

xuguohai/X-CLIP
An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"
Language: Python - Size: 1.57 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 9

shufangxun/MAC
An end-to-end masked contrastive video-and-language pre-training framework
Size: 1.96 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 0

LeapLabTHU/Cross-Modal-Adapter
[arXiv] Cross-Modal Adapter for Text-Video Retrieval
Size: 3.39 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 2

rn-snehapriya/Automatic-Note-Taking-From-Video-Using-Tesseract-OCR
Text from the video is extracted and saved into a .docx file in the form of notes.
Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: about 2 years ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1
