visual-question-answering | Topic

Topic: "visual-question-answering"

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Language: Python - Size: 120 MB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 2,491 - Forks: 249

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 1,429 - Forks: 379

lucidrains/flamingo-pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Language: Python - Size: 212 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,237 - Forks: 62

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Language: Python - Size: 12.2 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 970 - Forks: 105

richard-peng-xia/awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Size: 234 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 730 - Forks: 65

jnhwkim/ban-vqa 📦

Bilinear attention networks for visual question answering

Language: Python - Size: 1.21 MB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 545 - Forks: 100

MILVLG/mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering

Language: Python - Size: 1.84 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 432 - Forks: 88

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 186 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 413 - Forks: 34

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Size: 82.3 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 401 - Forks: 19

davidmascharka/tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 348 - Forks: 74

MILVLG/openvqa

A lightweight, scalable, and general framework for visual question answering research

Language: Python - Size: 833 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 307 - Forks: 64

lupantech/MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Language: Jupyter Notebook - Size: 50.2 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 292 - Forks: 48

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language: Python - Size: 1.09 MB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 27

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Size: 49.8 MB - Last synced at: 2 days ago - Pushed at: 11 months ago - Stars: 246 - Forks: 33

Cyanogenoid/pytorch-vqa

Strong baseline for visual question answering

Language: Python - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 239 - Forks: 99

markdtw/vqa-winner-cvprw-2017

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Language: Python - Size: 25.4 KB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 163 - Forks: 38

Yushi-Hu/tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Language: Python - Size: 6.08 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 159 - Forks: 9

qiantianwen/NuScenes-QA

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

Language: Python - Size: 1.57 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 152 - Forks: 1

zhegan27/VILLA

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Language: Python - Size: 849 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 119 - Forks: 14

antoyang/just-ask

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Language: Jupyter Notebook - Size: 917 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 98 - Forks: 13

anisha2102/docvqa

Document Visual Question Answering

Language: Python - Size: 146 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 85 - Forks: 20

MMStar-Benchmark/MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 1

mesnico/RelationNetworks-CLEVR

A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset

Language: Python - Size: 3.68 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 83 - Forks: 26

showlab/LOVA3

(NeurIPS 2024) Official PyTorch implementation of LOVA3

Language: Python - Size: 6.01 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 2

mlvlab/Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Language: Python - Size: 1.24 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 75 - Forks: 10

antoyang/FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Language: Python - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 72 - Forks: 13

violetteshev/bottom-up-features

Bottom-up features extractor implemented in PyTorch.

Language: Python - Size: 178 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 71 - Forks: 19

Shivanshu-Gupta/Visual-Question-Answering

CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering

Language: Python - Size: 2.9 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 68 - Forks: 18

rentainhe/TRAR-VQA

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

Language: Python - Size: 927 KB - Last synced at: 26 days ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

China-UK-ZSL/ZS-F-VQA

[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph

Language: Python - Size: 37.3 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 65 - Forks: 15

DenisDsh/VizWiz-VQA-PyTorch

PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People

Language: Jupyter Notebook - Size: 347 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 60 - Forks: 19

badripatro/PQG

Code for paper title "Learning Semantic Sentence Embeddings using Pair-wise Discriminator" COLING-2018

Language: Jupyter Notebook - Size: 14.3 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 53 - Forks: 9

SKTBrain/KVQA

Korean Visual Question Answering

Size: 3.58 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 51 - Forks: 4

mapluisch/LLaVA-CLI-with-multiple-images

LLaVA inference with multiple images at once for cross-image analysis.

Language: Python - Size: 24 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 50 - Forks: 4

ai-forever/fusion_brain_aij2021

Creating multimodal multitask models

Language: Jupyter Notebook - Size: 4.29 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 15

AdrianBZG/llama-multimodal-vqa

Multimodal Instruction Tuning for Llama 3

Language: Python - Size: 31.3 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 48 - Forks: 10

allenai/aokvqa

Official repository for the A-OKVQA dataset

Language: Python - Size: 2.62 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 47 - Forks: 5

Glaciohound/VCML

PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019

Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 47 - Forks: 7

aioz-ai/MICCAI19-MedVQA

AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)

Language: Python - Size: 137 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 26

lucidrains/AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

Language: Python - Size: 39.1 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 42 - Forks: 5

lupantech/dual-mfa-vqa

Co-attending Regions and Detections for VQA.

Language: Matlab - Size: 1.44 MB - Last synced at: 29 days ago - Pushed at: almost 7 years ago - Stars: 40 - Forks: 14

jialinwu17/self_critical_vqa

Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''

Language: Python - Size: 74.6 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 39 - Forks: 9

aioz-ai/ICCV19_VQA-CTI

Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)

Language: Python - Size: 818 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 38 - Forks: 8

paarthneekhara/convolutional-vqa

Language: Python - Size: 108 KB - Last synced at: 8 days ago - Pushed at: over 7 years ago - Stars: 38 - Forks: 12

caffeinism/FiLM-pytorch

PyTorch implementation of FiLM: Visual Reasoning with a General Conditioning Layer

Language: Python - Size: 33 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 37 - Forks: 6

VisualWebBench/VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language: Python - Size: 3.17 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 1

Cloud-CV/vilbert-multi-task

:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo

Language: Python - Size: 1.27 MB - Last synced at: about 15 hours ago - Pushed at: over 2 years ago - Stars: 35 - Forks: 5

ivonajdenkoska/multimodal-meta-learn

Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).

Language: Python - Size: 13.8 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 1

pramodkaushik/acl18_results

Code to reproduce results in our ACL 2018 paper "Did the Model Understand the Question?"

Size: 1.4 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 34 - Forks: 6

vmichals/FigureQA-baseline

TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"

Language: Python - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 34 - Forks: 8

vzhou842/easy-VQA

The Easy Visual Question Answering dataset.

Language: Python - Size: 9.5 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 11

Toloka/WSDMCup2023

Toloka Visual Question Answering Challenge at WSDM Cup 2023

Language: Jupyter Notebook - Size: 5.24 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 7

JunweiLiang/FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

Language: Python - Size: 723 KB - Last synced at: 19 days ago - Pushed at: almost 6 years ago - Stars: 32 - Forks: 15

mbzuai-oryx/Camel-Bench

[NAACL 2025 🔥] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.

Language: Python - Size: 14 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 31 - Forks: 1

noagarcia/knowit-rock

ROCK model for Knowledge-Based VQA in Videos

Language: Python - Size: 347 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 5

MileBench/MileBench

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

Language: Python - Size: 3.52 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 29 - Forks: 1

badripatro/awesome-vqg

Visual Question Generation reading list

Size: 18.6 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 4

TIGER-AI-Lab/VIEScore

Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024 main)

Language: Python - Size: 21.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 1

sdc17/CrossGET

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

Language: Python - Size: 11.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 26 - Forks: 0

cdancette/detect-shortcuts

Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

Language: Python - Size: 58.6 KB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 26 - Forks: 1

basakbuluz/Visual-Question-Answering

:camera: :question: Visual Question Answering Demo and Algorithmia API

Language: Jupyter Notebook - Size: 53 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 26 - Forks: 6

hackerchenzhuo/LaKo

[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Language: Python - Size: 64.9 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 3

visual-haystacks/vhs_benchmark

🔥 [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"

Language: Python - Size: 5.22 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 24 - Forks: 1

aligholami/hexia

Mid-level PyTorch Based Framework for Visual Question Answering.

Language: Python - Size: 20 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 2

SKTBrain/BAN-KVQA

Bilinear Attention Networks for Korean Visual Question Answering

Language: Python - Size: 1.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 22 - Forks: 4

ExplainableML/CLEVR-X

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Language: Python - Size: 2.27 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 1

zhegan27/LXMERT-AdvTrain

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT adversarial training part

Language: Python - Size: 898 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 21 - Forks: 0

kkahatapitiya/LangRepo

Language Repository for Long Video Understanding

Language: Python - Size: 613 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 20 - Forks: 3

noagarcia/ROLL-VideoQA

PyTorch code for ROLL, a knowledge-based video story question answering model.

Language: Python - Size: 522 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 4

abachaa/VQA-Med-2021

VQA-Med 2021

Language: Python - Size: 46.7 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 19 - Forks: 3

Axe--/Visual-Question-Answering

PyTorch Implementation of VQA Baseline & Hierarchical Co-Attention model

Language: Python - Size: 364 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 5

badripatro/awesome-visual-dialog

Visual Dialog

Size: 16.6 KB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 1

sominw/vqamd_floyd

Visual Question Answering through modal dialogue + API

Language: Jupyter Notebook - Size: 19.9 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 9

noagarcia/awesome-vqa-pytorch

List of PyTorch repositories for visual question answering

Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 2

mlvlab/OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Language: Python - Size: 619 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 0

Neerajj9/Stacked-Attention-Networks-for-Visual-Question-Answering

Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow

Language: Python - Size: 15.3 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 4

RachanaJayaram/Cross-Attention-VizWiz-VQA

A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.

Language: Python - Size: 2.55 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 6