GitHub topics: video-text-retrieval

Repositories

Paranioar/Awesome_Matching_Pretraining_Transfering

The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.

Size: 369 KB - Last synced at: 4 days ago - Pushed at: 6 months ago - Stars: 422 - Forks: 48

microsoft/UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

Language: Python - Size: 219 KB - Last synced at: 4 days ago - Pushed at: 11 months ago - Stars: 355 - Forks: 58

alipay/Ant-Multi-Modal-Framework

Research Code for Multimodal-Cognition Team in Ant Group

Language: Python - Size: 18.8 MB - Last synced at: 22 days ago - Pushed at: about 1 month ago - Stars: 147 - Forks: 5

salesforce/ALPRO 📦

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Language: Python - Size: 311 KB - Last synced at: about 2 months ago - Pushed at: about 2 months ago - Stars: 187 - Forks: 17

amazon-science/crossmodal-contrastive-learning

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021

Language: Python - Size: 766 KB - Last synced at: about 2 months ago - Pushed at: over 3 years ago - Stars: 63 - Forks: 11

ArrowLuo/CLIP4Clip

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

Language: Python - Size: 1.61 MB - Last synced at: 3 months ago - Pushed at: about 1 year ago - Stars: 929 - Forks: 126

m-bain/CondensedMovies

Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset. [ACCV'20]

Language: Python - Size: 22 MB - Last synced at: 3 months ago - Pushed at: almost 3 years ago - Stars: 175 - Forks: 28

whwu95/Cap4Video

【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Language: Python - Size: 8.58 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 240 - Forks: 20

unitaryai/VTC

VTC: Improving Video-Text Retrieval with User Comments

Language: Python - Size: 5.45 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 11 - Forks: 0

RenShuhuai-Andy/TESTA

[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding

Language: Python - Size: 835 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 40 - Forks: 3

unitaryai/VTC-dataset

Language: Python - Size: 38.1 KB - Last synced at: 10 months ago - Pushed at: about 1 year ago - Stars: 0 - Forks: 0

xuguohai/X-CLIP

An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"

Language: Python - Size: 1.57 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 55 - Forks: 9

shufangxun/MAC

An end-to-end masked contrastive video-and-language pre-training framework

Size: 1.96 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 0

LeapLabTHU/Cross-Modal-Adapter

[arXiv] Cross-Modal Adapter for Text-Video Retrieval

Size: 3.39 MB - Last synced at: over 2 years ago - Pushed at: over 2 years ago - Stars: 29 - Forks: 2

rn-snehapriya/Automatic-Note-Taking-From-Video-Using-Tesseract-OCR

Text from the video is extracted and saved into a .docx file in the form of notes.

Language: Jupyter Notebook - Size: 6.84 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 4 - Forks: 1

Related Keywords

video-text-retrieval 15 msrvtt 4 video-understanding 4 vision-and-language 4 activitynet 3 clip 3 didemo 3 multimodal 3 video-language 3 dataset 2 pretraining 2 video 2 multimodality 2 video-text 2 multimodal-learning 2 vision-language-pretraining 2 retrieval 2 machine-learning 2 image-text-retrieval 2 video-text-recognition 2 msvd 2 contrastive-learning 2 lsmdc 2 pytorch 2 ranking 1 retrieval-model 1 search 1 video-clip-retrieval 1 automatic-note-taking 1 ocr 1 tesseract-ocr 1 video-to-text 1 adapter 1 vision-transformer 1 mae 1 end-to-end-learning 1 vision-language-dataset 1 video-qa 1 deep-learning 1 long-video-understanding 1 vision-language-transformer 1 multimodal-deep-learning 1 comments 1 parameter-efficient-learning 1 video-language-understanding 1 parameter-efficient-tuning 1 cross-modal-learning 1 source-videos 1 precomputed-features 1 video-captioning 1 caption 1 alignment 1 visual-semantic-embedding 1 tutorial 1 text-to-video-generation 1 text-to-image-synthesis 1 text-to-image-generation 1 parameter-efficient-fine-tuning 1 multimodal-pretraining 1 multimodal-large-language-models 1 memory-efficient-tuning 1 large-vision-models 1 large-vision-language-models 1 large-language-models 1 large-language-model 1 image-text-matching 1 cross-modal-retrieval 1 awesome-list 1 transformers 1 natural-language-processing 1 multi-modality 1 computer-vision 1 video-question-answering 1 representation-learning 1 prompt-learning 1 video-editing 1 multimodal-llm 1 youcookii 1 segmentation 1 retrieval-task 1 pretrain 1 multimodal-sentiment-analysis 1 localization 1 joint 1 coin 1 caption-task 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Repos