GitHub topics: video-question-answering
zhousheng97/ViTXT-GQA
✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)
Language: Python - Size: 6.97 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 15 - Forks: 1

bytedance/Shot2Story
A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
Language: Python - Size: 153 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 128 - Forks: 6

OpenGVLab/Ask-Anything
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Language: Python - Size: 20.7 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 3,213 - Forks: 261

Vision-CAIR/MiniGPT4-video
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
Language: Python - Size: 38.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 609 - Forks: 67

apple/ml-slowfast-llava
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Language: Python - Size: 375 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 210 - Forks: 13

jayleicn/TVQAplus
[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering
Language: Python - Size: 14 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 129 - Forks: 24

jpthu17/HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Language: Python - Size: 51 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 116 - Forks: 5

bcmi/Causal-VidQA
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.
Language: Python - Size: 20.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 57 - Forks: 4

OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language: Python - Size: 53.3 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1,771 - Forks: 106

salesforce/ALPRO
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Language: Python - Size: 310 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 186 - Forks: 17

jpthu17/EMCL
[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Language: Python - Size: 23.9 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 132 - Forks: 9

jayleicn/ClipBERT
[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.
Language: Python - Size: 73.2 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 718 - Forks: 86

mlvlab/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Language: Python - Size: 1.24 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 75 - Forks: 10

Pavansomisetty21/Multimodal-AI-Agent-for-Video-Understanding-and-Research-using-Gemini-LLM
In this we implement Multimodal AI Agent for Video Understanding and Research we can ask any questions on video it will answer to it
Language: Jupyter Notebook - Size: 4.21 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mmazab/LifeQA
Data and PyTorch code for the LifeQA LREC 2020 paper.
Language: Python - Size: 13.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 1

X-PLUG/mPLUG-2
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)
Language: Python - Size: 2.36 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 19

X-PLUG/Youku-mPLUG
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks
Language: Python - Size: 15.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 292 - Forks: 11

declare-lab/Sealing
[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"
Language: Python - Size: 8.92 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 11 - Forks: 3

chakravarthi589/Video-Question-Answering_Resources
Video Question Answering | Video QA | VQA
Size: 546 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 5

lyuchenyang/Efficient-VideoQA
Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"
Language: Python - Size: 28.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

lyuchenyang/Semantic-aware-VideoQA
Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"
Language: Python - Size: 31.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

jena-shreyas/Efficient-VidQA
Part of my work for my Bachelor's Thesis Project on Counterfactual Reasoning for Videos.
Language: Python - Size: 11.5 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ecoxial2007/EffVideoQA
Efficient Video Question Answering
Language: Python - Size: 3.74 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

doc-doc/NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
Language: Python - Size: 6.67 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 114 - Forks: 11

doc-doc/NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
Language: Python - Size: 5.64 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 33 - Forks: 1

whwu95/FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
Language: Python - Size: 3.22 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 34 - Forks: 0

mlvlab/OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
Language: Python - Size: 619 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 13 - Forks: 0

mlvlab/MELTR
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
Language: Python - Size: 1.13 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 30 - Forks: 6

zchoi/PKOL
[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”
Language: Python - Size: 505 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 0

tsujuifu/pytorch_violet
A PyTorch implementation of VIOLET
Language: Python - Size: 115 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 7

tsujuifu/pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
Language: Python - Size: 449 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 30 - Forks: 2

doc-doc/CoVGT
Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)
Language: Python - Size: 6.01 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 1

sail-sg/VGT
Video Graph Transformer for Video Question Answering (ECCV'22)
Language: Python - Size: 454 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 9

XLiu443/Tem-adapter
[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer
Language: Python - Size: 141 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 0

MichiganNLP/wildqa
WildQA website code
Language: HTML - Size: 832 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MichiganNLP/lifeqa
LifeQA website code
Language: HTML - Size: 795 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

noagarcia/ROLL-VideoQA
PyTorch code for ROLL, a knowledge-based video story question answering model.
Language: Python - Size: 522 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 4

AmrHendy/multimedia_question_answering
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
Language: Python - Size: 780 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 4

antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Language: Jupyter Notebook - Size: 917 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 98 - Forks: 13

doc-doc/HQGA
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)
Language: Python - Size: 37.2 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 3

noagarcia/knowit-rock
ROCK model for Knowledge-Based VQA in Videos
Language: Python - Size: 347 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 5

yl3800/IGV
This repo contains code for Invariant Grounding for Video Question Answering
Language: Python - Size: 706 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 19 - Forks: 2

antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Language: Python - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 72 - Forks: 13

liveseongho/DramaQA
DramaQA Starter Code (2021)
Language: Python - Size: 69.9 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 3

gzcsudo/MSPAN-VideoQA
Multi-Scale Progressive Attention Network for Video Question Answering
Language: Python - Size: 81.1 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

SteveImmanuel/LRCE-VQA
Novel lightweight multi-modal encoder for various tasks in computer vision
Language: Python - Size: 266 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

nicolas-dufour/video-question-answering
Given a video, we are able to automaticaly answer questions about what is happening in the video.
Language: Jupyter Notebook - Size: 3 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Abdelrhman-Yasser/multimedia_question_answering Fork of AmrHendy/multimedia_question_answering
A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.
Language: Python - Size: 782 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0
