GitHub topics: video-question-answering

Repositories

zhousheng97/ViTXT-GQA

✨✨ Scene-Text Grounding for Text-Based Video Question Answering (arxiv)

Language: Python - Size: 6.97 MB - Last synced at: 1 day ago - Pushed at: 2 days ago - Stars: 15 - Forks: 1

bytedance/Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Language: Python - Size: 153 MB - Last synced at: 6 days ago - Pushed at: 3 months ago - Stars: 128 - Forks: 6

OpenGVLab/Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language: Python - Size: 20.7 MB - Last synced at: 9 days ago - Pushed at: 3 months ago - Stars: 3,213 - Forks: 261

Vision-CAIR/MiniGPT4-video

Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding

Language: Python - Size: 38.7 MB - Last synced at: 9 days ago - Pushed at: 4 months ago - Stars: 609 - Forks: 67

apple/ml-slowfast-llava

SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Language: Python - Size: 375 KB - Last synced at: 9 days ago - Pushed at: 7 months ago - Stars: 210 - Forks: 13

jayleicn/TVQAplus

[ACL 2020] PyTorch code for TVQA+: Spatio-Temporal Grounding for Video Question Answering

Language: Python - Size: 14 MB - Last synced at: 4 days ago - Pushed at: over 2 years ago - Stars: 129 - Forks: 24

jpthu17/HBI

[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Language: Python - Size: 51 MB - Last synced at: 13 days ago - Pushed at: 4 months ago - Stars: 116 - Forks: 5

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.

Language: Python - Size: 20.6 MB - Last synced at: 19 days ago - Pushed at: 19 days ago - Stars: 57 - Forks: 4

OpenGVLab/InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language: Python - Size: 53.3 MB - Last synced at: 18 days ago - Pushed at: about 2 months ago - Stars: 1,771 - Forks: 106

salesforce/ALPRO

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

Language: Python - Size: 310 KB - Last synced at: 12 days ago - Pushed at: over 2 years ago - Stars: 186 - Forks: 17

jpthu17/EMCL

[NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Language: Python - Size: 23.9 MB - Last synced at: 20 days ago - Pushed at: about 1 year ago - Stars: 132 - Forks: 9

jayleicn/ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

Language: Python - Size: 73.2 KB - Last synced at: 15 days ago - Pushed at: over 1 year ago - Stars: 718 - Forks: 86

mlvlab/Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Language: Python - Size: 1.24 MB - Last synced at: 24 days ago - Pushed at: 24 days ago - Stars: 75 - Forks: 10

Pavansomisetty21/Multimodal-AI-Agent-for-Video-Understanding-and-Research-using-Gemini-LLM

In this we implement Multimodal AI Agent for Video Understanding and Research we can ask any questions on video it will answer to it

Language: Jupyter Notebook - Size: 4.21 MB - Last synced at: 20 days ago - Pushed at: about 1 month ago - Stars: 0 - Forks: 0

mmazab/LifeQA

Data and PyTorch code for the LifeQA LREC 2020 paper.

Language: Python - Size: 13.2 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 1

X-PLUG/mPLUG-2

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video (ICML 2023)

Language: Python - Size: 2.36 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 223 - Forks: 19

X-PLUG/Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

Language: Python - Size: 15.1 MB - Last synced at: about 1 month ago - Pushed at: over 1 year ago - Stars: 292 - Forks: 11

declare-lab/Sealing

[NAACL 2024] Official Implementation of paper "Self-Adaptive Sampling for Efficient Video Question Answering on Image--Text Models"

Language: Python - Size: 8.92 MB - Last synced at: 5 days ago - Pushed at: 9 months ago - Stars: 11 - Forks: 3

chakravarthi589/Video-Question-Answering_Resources

Video Question Answering | Video QA | VQA

Size: 546 KB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 9 - Forks: 5

lyuchenyang/Efficient-VideoQA

Code for ACL SustaiNLP 2023 paper "Is a Video worth n × n Images? A Highly Efficient Approach to Transformer-based Video Question Answering"

Language: Python - Size: 28.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 2 - Forks: 0

lyuchenyang/Semantic-aware-VideoQA

Code for ACL SRW 2023 paepr "Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering"

Language: Python - Size: 31.3 KB - Last synced at: about 1 month ago - Pushed at: almost 2 years ago - Stars: 3 - Forks: 0

jena-shreyas/Efficient-VidQA

Part of my work for my Bachelor's Thesis Project on Counterfactual Reasoning for Videos.

Language: Python - Size: 11.5 MB - Last synced at: 7 months ago - Pushed at: 7 months ago - Stars: 0 - Forks: 0

ecoxial2007/EffVideoQA

Efficient Video Question Answering

Language: Python - Size: 3.74 MB - Last synced at: 8 months ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

doc-doc/NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Language: Python - Size: 6.67 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 114 - Forks: 11

doc-doc/NExT-GQA

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

Language: Python - Size: 5.64 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 33 - Forks: 1

whwu95/FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

Language: Python - Size: 3.22 MB - Last synced at: 10 months ago - Pushed at: 10 months ago - Stars: 34 - Forks: 0

mlvlab/OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Language: Python - Size: 619 KB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 13 - Forks: 0

mlvlab/MELTR

MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)

Language: Python - Size: 1.13 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 30 - Forks: 6

zchoi/PKOL

[TIP 2022] Official code of paper “Video Question Answering with Prior Knowledge and Object-sensitive Learning”

Language: Python - Size: 505 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 44 - Forks: 0

tsujuifu/pytorch_violet

A PyTorch implementation of VIOLET

Language: Python - Size: 115 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 130 - Forks: 7

tsujuifu/pytorch_empirical-mvm

A PyTorch implementation of EmpiricalMVM

Language: Python - Size: 449 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 30 - Forks: 2

doc-doc/CoVGT

Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)

Language: Python - Size: 6.01 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 14 - Forks: 1

sail-sg/VGT

Video Graph Transformer for Video Question Answering (ECCV'22)

Language: Python - Size: 454 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 38 - Forks: 9

XLiu443/Tem-adapter

[ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer

Language: Python - Size: 141 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 13 - Forks: 0

MichiganNLP/wildqa

WildQA website code

Language: HTML - Size: 832 KB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

MichiganNLP/lifeqa

LifeQA website code

Language: HTML - Size: 795 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 4 - Forks: 0

noagarcia/ROLL-VideoQA

PyTorch code for ROLL, a knowledge-based video story question answering model.

Language: Python - Size: 522 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 4

AmrHendy/multimedia_question_answering

A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.

Language: Python - Size: 780 KB - Last synced at: over 1 year ago - Pushed at: almost 6 years ago - Stars: 1 - Forks: 4

antoyang/just-ask

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Language: Jupyter Notebook - Size: 917 KB - Last synced at: over 1 year ago - Pushed at: over 2 years ago - Stars: 98 - Forks: 13

doc-doc/HQGA

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

Language: Python - Size: 37.2 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 23 - Forks: 3

noagarcia/knowit-rock

ROCK model for Knowledge-Based VQA in Videos

Language: Python - Size: 347 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 5

yl3800/IGV

This repo contains code for Invariant Grounding for Video Question Answering

Language: Python - Size: 706 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 19 - Forks: 2

antoyang/FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Language: Python - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 72 - Forks: 13

liveseongho/DramaQA

DramaQA Starter Code (2021)

Language: Python - Size: 69.9 MB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 9 - Forks: 3

gzcsudo/MSPAN-VideoQA

Multi-Scale Progressive Attention Network for Video Question Answering

Language: Python - Size: 81.1 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

SteveImmanuel/LRCE-VQA

Novel lightweight multi-modal encoder for various tasks in computer vision

Language: Python - Size: 266 KB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 0 - Forks: 0

nicolas-dufour/video-question-answering

Given a video, we are able to automaticaly answer questions about what is happening in the video.

Language: Jupyter Notebook - Size: 3 MB - Last synced at: almost 2 years ago - Pushed at: almost 4 years ago - Stars: 0 - Forks: 0

Abdelrhman-Yasser/multimedia_question_answering Fork of AmrHendy/multimedia_question_answering

A simple attention deep learning model to answer questions about a given video with the most relevant video intervals as answers.

Language: Python - Size: 782 KB - Last synced at: about 2 years ago - Pushed at: almost 6 years ago - Stars: 3 - Forks: 0

Related Keywords

video-question-answering 48 videoqa 11 video-retrieval 11 video-understanding 10 pytorch 9 deep-learning 9 visual-question-answering 7 dataset 6 vqa 6 computer-vision 6 vision-and-language 6 machine-learning 5 vision-language 5 nlp 5 natural-language-processing 4 multimodal-large-language-models 4 video 4 pre-training 4 video-captioning 4 research 4 large-language-models 4 multimodal 4 question-answering 3 videos 3 youtube 3 foundation-models 3 video-language-understanding 3 benchmark 3 multi-modal-learning 3 multi-modal 3 lrec2020 2 lrec 2 lifeqa 2 real-life 2 neurips 2 cvpr2023 2 mllm 2 contrastive-learning 2 artificial-intelligence 2 multimodal-pretraining 2 video-question-answering-dataset 2 trustworthy-vqa 2 video-language 2 cvpr 2 cross-modal-retrieval 2 multimodal-learning 2 weakly-supervised-learning 2 chatgpt 2 multimodal-deep-learning 1 scene-understanding 1 visual-language-learning 1 causal-temporal-action-reasoning 1 multi-object-interaction 1 video-grounding 1 visual-evidence-grounding 1 chatbot 1 llava 1 training-free 1 vision-language-model 1 zero-shot-video-captioning 1 iccv2023 1 meta-learning 1 multimedia-retrieval 1 video-description 1 video-processing 1 visual-deep-learning 1 question-generation 1 conditional-graph-hierarchy 1 knowledge-base 1 cvpr-2022 1 cvpr-oral-2022 1 generalization 1 interpretable 1 invariant-learning 1 acl2021 1 python 1 python3 1 tensorflow 1 pytorch-implementation 1 dynamic-visual-graph 1 graph-transformer 1 temporal-dynamics 1 clip-model 1 coling 1 coling2022 1 in-the-wild 1 natual-language-processing 1 wildqa 1 knowledge-based-reasoning 1 attention-model 1 attention-seq2seq 1 cnn 1 feature-extraction 1 glove-embeddings 1 wacv 1 video-clip 1 temporal-action-localization 1 spatio-temporal-action-localization 1 self-supervised 1 open-set-recognition 1