Topic: "visual-question-answering"
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: 29 days ago - Pushed at: 9 months ago - Stars: 5,165 - Forks: 681

OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language: Python - Size: 120 MB - Last synced at: 24 days ago - Pushed at: about 1 year ago - Stars: 2,491 - Forks: 249

peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Language: Jupyter Notebook - Size: 13.4 MB - Last synced at: 6 months ago - Pushed at: over 2 years ago - Stars: 1,429 - Forks: 379

lucidrains/flamingo-pytorch
Implementation of ๐ฆฉ Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
Language: Python - Size: 212 KB - Last synced at: about 1 month ago - Pushed at: over 2 years ago - Stars: 1,237 - Forks: 62

YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language: Python - Size: 12.2 MB - Last synced at: 26 days ago - Pushed at: about 2 years ago - Stars: 970 - Forks: 105

richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 234 KB - Last synced at: 3 days ago - Pushed at: about 1 month ago - Stars: 730 - Forks: 65

jnhwkim/ban-vqa ๐ฆ
Bilinear attention networks for visual question answering
Language: Python - Size: 1.21 MB - Last synced at: 25 days ago - Pushed at: over 1 year ago - Stars: 545 - Forks: 100

MILVLG/mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
Language: Python - Size: 1.84 MB - Last synced at: 12 months ago - Pushed at: over 4 years ago - Stars: 432 - Forks: 88

MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 186 MB - Last synced at: 23 days ago - Pushed at: 23 days ago - Stars: 413 - Forks: 34

zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Size: 82.3 MB - Last synced at: 27 days ago - Pushed at: 5 months ago - Stars: 401 - Forks: 19

davidmascharka/tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Language: Jupyter Notebook - Size: 21.8 MB - Last synced at: about 1 month ago - Pushed at: over 3 years ago - Stars: 348 - Forks: 74

MILVLG/openvqa
A lightweight, scalable, and general framework for visual question answering research
Language: Python - Size: 833 KB - Last synced at: 12 months ago - Pushed at: over 3 years ago - Stars: 307 - Forks: 64

lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
Language: Jupyter Notebook - Size: 50.2 MB - Last synced at: 26 days ago - Pushed at: 5 months ago - Stars: 292 - Forks: 48

MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language: Python - Size: 1.09 MB - Last synced at: 12 months ago - Pushed at: almost 2 years ago - Stars: 259 - Forks: 27

HanXinzi-AI/awesome-computer-vision-resources
a collection of computer vision projects&tools. ่ฎก็ฎๆบ่ง่งๆนๅ้กน็ฎๅๅทฅๅ ท้ๅใ
Size: 49.8 MB - Last synced at: 2 days ago - Pushed at: 11 months ago - Stars: 246 - Forks: 33

Cyanogenoid/pytorch-vqa
Strong baseline for visual question answering
Language: Python - Size: 21.5 KB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 239 - Forks: 99

markdtw/vqa-winner-cvprw-2017
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
Language: Python - Size: 25.4 KB - Last synced at: 3 days ago - Pushed at: about 6 years ago - Stars: 163 - Forks: 38

Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Language: Python - Size: 6.08 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 159 - Forks: 9

qiantianwen/NuScenes-QA
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
Language: Python - Size: 1.57 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 152 - Forks: 1

zhegan27/VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Language: Python - Size: 849 KB - Last synced at: 5 months ago - Pushed at: over 4 years ago - Stars: 119 - Forks: 14

antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Language: Jupyter Notebook - Size: 917 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 98 - Forks: 13

anisha2102/docvqa
Document Visual Question Answering
Language: Python - Size: 146 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 85 - Forks: 20

MMStar-Benchmark/MMStar
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language: Python - Size: 3.41 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 84 - Forks: 1

mesnico/RelationNetworks-CLEVR
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
Language: Python - Size: 3.68 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 83 - Forks: 26

showlab/LOVA3
(NeurIPS 2024) Official PyTorch implementation of LOVA3
Language: Python - Size: 6.01 MB - Last synced at: 29 days ago - Pushed at: about 2 months ago - Stars: 81 - Forks: 2

mlvlab/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Language: Python - Size: 1.24 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 75 - Forks: 10

antoyang/FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Language: Python - Size: 88.9 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 72 - Forks: 13

violetteshev/bottom-up-features
Bottom-up features extractor implemented in PyTorch.
Language: Python - Size: 178 KB - Last synced at: 6 months ago - Pushed at: over 5 years ago - Stars: 71 - Forks: 19

Shivanshu-Gupta/Visual-Question-Answering
CNN+LSTM, Attention based, and MUTAN-based models for Visual Question Answering
Language: Python - Size: 2.9 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 68 - Forks: 18

rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Language: Python - Size: 927 KB - Last synced at: 26 days ago - Pushed at: over 3 years ago - Stars: 66 - Forks: 18

China-UK-ZSL/ZS-F-VQA
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
Language: Python - Size: 37.3 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 65 - Forks: 15

DenisDsh/VizWiz-VQA-PyTorch
PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People
Language: Jupyter Notebook - Size: 347 KB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 60 - Forks: 19

badripatro/PQG
Code for paper title "Learning Semantic Sentence Embeddings using Pair-wise Discriminator" COLING-2018
Language: Jupyter Notebook - Size: 14.3 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 53 - Forks: 9

SKTBrain/KVQA
Korean Visual Question Answering
Size: 3.58 MB - Last synced at: over 1 year ago - Pushed at: about 5 years ago - Stars: 51 - Forks: 4

mapluisch/LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
Language: Python - Size: 24 MB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 50 - Forks: 4

ai-forever/fusion_brain_aij2021
Creating multimodal multitask models
Language: Jupyter Notebook - Size: 4.29 MB - Last synced at: 19 days ago - Pushed at: over 2 years ago - Stars: 50 - Forks: 15

AdrianBZG/llama-multimodal-vqa
Multimodal Instruction Tuning for Llama 3
Language: Python - Size: 31.3 KB - Last synced at: 8 days ago - Pushed at: about 1 year ago - Stars: 48 - Forks: 10

allenai/aokvqa
Official repository for the A-OKVQA dataset
Language: Python - Size: 2.62 MB - Last synced at: 12 months ago - Pushed at: 12 months ago - Stars: 47 - Forks: 5

Glaciohound/VCML
PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019
Language: Python - Size: 2.43 MB - Last synced at: over 1 year ago - Pushed at: over 5 years ago - Stars: 47 - Forks: 7

aioz-ai/MICCAI19-MedVQA
AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)
Language: Python - Size: 137 KB - Last synced at: 11 months ago - Pushed at: over 1 year ago - Stars: 43 - Forks: 26

lucidrains/AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Language: Python - Size: 39.1 KB - Last synced at: 8 days ago - Pushed at: over 4 years ago - Stars: 42 - Forks: 5

lupantech/dual-mfa-vqa
Co-attending Regions and Detections for VQA.
Language: Matlab - Size: 1.44 MB - Last synced at: 29 days ago - Pushed at: almost 7 years ago - Stars: 40 - Forks: 14

jialinwu17/self_critical_vqa
Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''
Language: Python - Size: 74.6 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 39 - Forks: 9

aioz-ai/ICCV19_VQA-CTI
Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)
Language: Python - Size: 818 KB - Last synced at: 11 months ago - Pushed at: over 2 years ago - Stars: 38 - Forks: 8

paarthneekhara/convolutional-vqa
Language: Python - Size: 108 KB - Last synced at: 8 days ago - Pushed at: over 7 years ago - Stars: 38 - Forks: 12

caffeinism/FiLM-pytorch
PyTorch implementation of FiLM: Visual Reasoning with a General Conditioning Layer
Language: Python - Size: 33 MB - Last synced at: about 2 years ago - Pushed at: over 5 years ago - Stars: 37 - Forks: 6

VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language: Python - Size: 3.17 MB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 36 - Forks: 1

Cloud-CV/vilbert-multi-task
:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
Language: Python - Size: 1.27 MB - Last synced at: about 15 hours ago - Pushed at: over 2 years ago - Stars: 35 - Forks: 5

ivonajdenkoska/multimodal-meta-learn
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
Language: Python - Size: 13.8 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 34 - Forks: 1

pramodkaushik/acl18_results
Code to reproduce results in our ACL 2018 paper "Did the Model Understand the Question?"
Size: 1.4 MB - Last synced at: about 2 years ago - Pushed at: almost 7 years ago - Stars: 34 - Forks: 6

vmichals/FigureQA-baseline
TensorFlow implementation of the CNN-LSTM, Relation Network and text-only baselines for the paper "FigureQA: An Annotated Figure Dataset for Visual Reasoning"
Language: Python - Size: 26.4 KB - Last synced at: over 1 year ago - Pushed at: about 7 years ago - Stars: 34 - Forks: 8

vzhou842/easy-VQA
The Easy Visual Question Answering dataset.
Language: Python - Size: 9.5 MB - Last synced at: 3 days ago - Pushed at: over 1 year ago - Stars: 33 - Forks: 11

Toloka/WSDMCup2023
Toloka Visual Question Answering Challenge at WSDM Cup 2023
Language: Jupyter Notebook - Size: 5.24 MB - Last synced at: 25 days ago - Pushed at: about 1 year ago - Stars: 32 - Forks: 7

JunweiLiang/FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Language: Python - Size: 723 KB - Last synced at: 19 days ago - Pushed at: almost 6 years ago - Stars: 32 - Forks: 15

mbzuai-oryx/Camel-Bench
[NAACL 2025 ๐ฅ] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.
Language: Python - Size: 14 MB - Last synced at: 7 days ago - Pushed at: 21 days ago - Stars: 31 - Forks: 1

noagarcia/knowit-rock
ROCK model for Knowledge-Based VQA in Videos
Language: Python - Size: 347 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 31 - Forks: 5

MileBench/MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Language: Python - Size: 3.52 MB - Last synced at: 3 months ago - Pushed at: 10 months ago - Stars: 29 - Forks: 1

badripatro/awesome-vqg
Visual Question Generation reading list
Size: 18.6 KB - Last synced at: 1 day ago - Pushed at: over 4 years ago - Stars: 29 - Forks: 4

TIGER-AI-Lab/VIEScore
Visual Instruction-guided Explainable Metric. Code for "Towards Explainable Metrics for Conditional Image Synthesis Evaluation" (ACL 2024 main)
Language: Python - Size: 21.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 28 - Forks: 1

sdc17/CrossGET
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Language: Python - Size: 11.6 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 26 - Forks: 0

cdancette/detect-shortcuts
Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Language: Python - Size: 58.6 KB - Last synced at: 26 days ago - Pushed at: 10 months ago - Stars: 26 - Forks: 1

basakbuluz/Visual-Question-Answering
:camera: :question: Visual Question Answering Demo and Algorithmia API
Language: Jupyter Notebook - Size: 53 MB - Last synced at: about 1 year ago - Pushed at: about 6 years ago - Stars: 26 - Forks: 6

hackerchenzhuo/LaKo
[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
Language: Python - Size: 64.9 MB - Last synced at: 5 months ago - Pushed at: about 1 year ago - Stars: 25 - Forks: 3

visual-haystacks/vhs_benchmark
๐ฅ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
Language: Python - Size: 5.22 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 24 - Forks: 1

aligholami/hexia
Mid-level PyTorch Based Framework for Visual Question Answering.
Language: Python - Size: 20 MB - Last synced at: over 1 year ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 2

SKTBrain/BAN-KVQA
Bilinear Attention Networks for Korean Visual Question Answering
Language: Python - Size: 1.8 MB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 22 - Forks: 4

ExplainableML/CLEVR-X
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Language: Python - Size: 2.27 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 21 - Forks: 1

zhegan27/LXMERT-AdvTrain
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": LXMERT adversarial training part
Language: Python - Size: 898 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 21 - Forks: 0

kkahatapitiya/LangRepo
Language Repository for Long Video Understanding
Language: Python - Size: 613 KB - Last synced at: 11 months ago - Pushed at: 11 months ago - Stars: 20 - Forks: 3

noagarcia/ROLL-VideoQA
PyTorch code for ROLL, a knowledge-based video story question answering model.
Language: Python - Size: 522 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 20 - Forks: 4

abachaa/VQA-Med-2021
VQA-Med 2021
Language: Python - Size: 46.7 MB - Last synced at: about 1 month ago - Pushed at: almost 3 years ago - Stars: 19 - Forks: 3

Axe--/Visual-Question-Answering
PyTorch Implementation of VQA Baseline & Hierarchical Co-Attention model
Language: Python - Size: 364 KB - Last synced at: about 1 year ago - Pushed at: over 1 year ago - Stars: 16 - Forks: 5

badripatro/awesome-visual-dialog
Visual Dialog
Size: 16.6 KB - Last synced at: 11 days ago - Pushed at: over 4 years ago - Stars: 16 - Forks: 1

sominw/vqamd_floyd
Visual Question Answering through modal dialogue + API
Language: Jupyter Notebook - Size: 19.9 MB - Last synced at: 3 days ago - Pushed at: over 2 years ago - Stars: 15 - Forks: 9

noagarcia/awesome-vqa-pytorch
List of PyTorch repositories for visual question answering
Size: 1.95 KB - Last synced at: 3 days ago - Pushed at: almost 6 years ago - Stars: 15 - Forks: 2

mlvlab/OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
Language: Python - Size: 619 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 13 - Forks: 0

Neerajj9/Stacked-Attention-Networks-for-Visual-Question-Answering
Implementation of the paper "Stacked Attention Networks for Image Question Answering" in Tensorflow
Language: Python - Size: 15.3 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 13 - Forks: 4

RachanaJayaram/Cross-Attention-VizWiz-VQA
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
Language: Python - Size: 2.55 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 12 - Forks: 6

securade/sentinel
Securade.ai Sentinel - A monitoring and surveillance application that enables visual Q&A and video captioning for existing CCTV cameras.
Language: Python - Size: 29.3 KB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 11 - Forks: 4

engindeniz/vitis
[ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Language: Python - Size: 270 KB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 11 - Forks: 0

bowen-upenn/Multi-Agent-VQA
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
Language: Python - Size: 10.6 MB - Last synced at: 2 days ago - Pushed at: 8 months ago - Stars: 11 - Forks: 0

jacobmarks/vqa-plugin
Perform visual question answering on your images
Language: Python - Size: 34.2 KB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 2

XingruiWang/3D-Aware-VQA
Official Code for the NeurIPS'23 paper "3D-Aware Visual Question Answering about Parts, Poses and Occlusions"
Language: Jupyter Notebook - Size: 29.5 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 11 - Forks: 0

gicheonkang/sglkt-visdial
๐ PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"
Language: Python - Size: 1.68 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 11 - Forks: 4

anujanegi/VQA
Visual Question Answering System
Language: Python - Size: 181 MB - Last synced at: 12 months ago - Pushed at: over 5 years ago - Stars: 11 - Forks: 0

maj34/Eye-Handicapped-Service
[ X:AI Conference ] ์๊ฐ์ฅ์ ์ธ์ ์ํ ์๋ด่ฆ ์๋น์ค
Language: Jupyter Notebook - Size: 31.4 MB - Last synced at: 12 months ago - Pushed at: over 1 year ago - Stars: 10 - Forks: 1

abachaa/VQA-Med-2020
VQA-Med 2020
Language: Python - Size: 62.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 10 - Forks: 2

shikamaru-96/Visual-Question-Answering
Implementation of the visual question answering model from the paper "Exploring Models and Data for Image Question Answering".
Language: Python - Size: 8.1 MB - Last synced at: about 1 year ago - Pushed at: about 7 years ago - Stars: 10 - Forks: 3

DigitalPhonetics/Intrinsic-Subgraph-Generation-for-VQA
Predicting a subgraph alongside the answer in a graph based VQA model
Language: Python - Size: 302 KB - Last synced at: 19 days ago - Pushed at: 4 months ago - Stars: 9 - Forks: 1

badripatro/MDN-VQG
Size: 3.4 MB - Last synced at: almost 2 years ago - Pushed at: over 5 years ago - Stars: 9 - Forks: 3

badripatro/DVQA
Size: 17.5 MB - Last synced at: about 2 years ago - Pushed at: about 6 years ago - Stars: 9 - Forks: 3

kingsdigitallab/kdl-vqa
Python tool for batch visual question answering (BVQA).
Language: Python - Size: 44.3 MB - Last synced at: about 3 hours ago - Pushed at: about 4 hours ago - Stars: 8 - Forks: 0

visual-haystacks/mirage
๐ฅ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
Language: Python - Size: 11.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 8 - Forks: 0

double125/Graph-Matching-Attention
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Language: Python - Size: 2.5 MB - Last synced at: about 1 year ago - Pushed at: about 2 years ago - Stars: 8 - Forks: 1

MunzerDw/DLVC-3DVQA
This is the official repository of our report 3D Visual Question Answering by Leonard Schenk and Munzer Dwedari for the course Deep Learning in Visual Computing.
Language: Python - Size: 18.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 6 - Forks: 0

guoyang9/vqa-prior
Implementation for our SIGIR 2019 paper --- Quantifying and Alleviating the Language Prior Problem in Visual Question Answering.
Language: Python - Size: 33.2 KB - Last synced at: almost 2 years ago - Pushed at: about 5 years ago - Stars: 6 - Forks: 3

kaist-cvml/I-HallA-v1.0
[AAAI 2025] Official Implementation of I-HallA v1.0
Language: Python - Size: 49.6 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 5 - Forks: 1

marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

kaylode/vqa-transformer
Visual Question Answering using Transformer and Bottom-Up attention. Implemented in Pytorch
Language: Python - Size: 111 MB - Last synced at: about 2 years ago - Pushed at: over 3 years ago - Stars: 5 - Forks: 1

aioz-ai/ECCVW20_MILQT
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering (ECCVW 2020)
Language: Python - Size: 1.58 MB - Last synced at: 11 months ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 0
