Ecosyste.ms: Repos
An open API service providing repository metadata for many open source software ecosystems.
GitHub topics: visual-question-answering
MMMU-Benchmark/MMMU
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Language: Python - Size: 3.34 MB - Last synced: about 4 hours ago - Pushed: about 8 hours ago - Stars: 273 - Forks: 19
aioz-ai/MICCAI19-MedVQA
AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)
Language: Python - Size: 137 KB - Last synced: 2 days ago - Pushed: 8 months ago - Stars: 43 - Forks: 26
miguelscarv/pheye
Pheye - a family of efficient small vision-language models
Language: Python - Size: 3.85 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0
HanXinzi-AI/awesome-computer-vision-resources
a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。
Size: 49.8 MB - Last synced: 9 days ago - Pushed: 11 days ago - Stars: 101 - Forks: 21
aioz-ai/ICCV19_VQA-CTI
Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)
Language: Python - Size: 818 KB - Last synced: 2 days ago - Pushed: over 1 year ago - Stars: 38 - Forks: 8
qiantianwen/NuScenes-QA
[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.
Size: 1.55 MB - Last synced: 10 days ago - Pushed: 6 months ago - Stars: 129 - Forks: 0
MileBench/MileBench
This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"
Language: Python - Size: 3.51 MB - Last synced: 9 days ago - Pushed: 13 days ago - Stars: 15 - Forks: 1
AdrianBZG/llama-multimodal-vqa
Multimodal Instruction Tuning for Llama 3
Language: Python - Size: 31.3 KB - Last synced: 11 days ago - Pushed: about 1 month ago - Stars: 10 - Forks: 2
lupantech/MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
Language: Jupyter Notebook - Size: 50 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 182 - Forks: 24
TIGER-AI-Lab/VIEScore
Towards Explainable Metrics for Conditional Image Synthesis Evaluation (ACL 2024)
Language: Python - Size: 7.61 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 15 - Forks: 0
richard-peng-xia/awesome-multimodal-in-medical-imaging
A collection of resources on applications of multi-modal learning in medical imaging.
Size: 273 KB - Last synced: 18 days ago - Pushed: 26 days ago - Stars: 332 - Forks: 33
kHarshit/llm-projects
LLM projects
Language: Jupyter Notebook - Size: 4.69 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 0
zjukg/KG-MM-Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Size: 82.2 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 204 - Forks: 13
hari-huynh/viVQA-voice-assistant
Voice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper
Language: Python - Size: 5.96 MB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 3 - Forks: 3
MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language: Python - Size: 1.09 MB - Last synced: 16 days ago - Pushed: about 1 year ago - Stars: 259 - Forks: 27
lucidrains/flamingo-pytorch
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
Language: Python - Size: 212 KB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 1,154 - Forks: 58
OFA-Sys/OFA
Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Language: Python - Size: 120 MB - Last synced: 18 days ago - Pushed: about 1 month ago - Stars: 2,336 - Forks: 245
MILVLG/mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
Language: Python - Size: 1.84 MB - Last synced: 16 days ago - Pushed: over 3 years ago - Stars: 432 - Forks: 88
allenai/aokvqa
Official repository for the A-OKVQA dataset
Language: Python - Size: 2.62 MB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 47 - Forks: 5
jacobmarks/vqa-plugin
Perform visual question answering on your images
Language: Python - Size: 34.2 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 11 - Forks: 2
MILVLG/openvqa
A lightweight, scalable, and general framework for visual question answering research
Language: Python - Size: 833 KB - Last synced: 16 days ago - Pushed: over 2 years ago - Stars: 307 - Forks: 64
reshalfahsi/vqa-clip-lstm
Visual Question Answering Using CLIP + LSTM
Language: Jupyter Notebook - Size: 4.67 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 0 - Forks: 0
ellenzhuwang/implicitOOD
An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.
Language: Python - Size: 175 KB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 2 - Forks: 0
Toloka/WSDMCup2023
Toloka Visual Question Answering Challenge at WSDM Cup 2023
Language: Jupyter Notebook - Size: 5.24 MB - Last synced: 29 days ago - Pushed: about 1 month ago - Stars: 29 - Forks: 7
mlvlab/Flipped-VQA
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
Language: Python - Size: 1.23 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 54 - Forks: 7
mlvlab/OVQA
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)
Language: Python - Size: 619 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 13 - Forks: 0
davidmascharka/tbd-nets
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"
Language: Jupyter Notebook - Size: 21.8 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 349 - Forks: 74
MMStar-Benchmark/MMStar
This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
Language: Python - Size: 3.41 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 84 - Forks: 1
jnhwkim/ban-vqa 📦
Bilinear attention networks for visual question answering
Language: Python - Size: 1.21 MB - Last synced: about 1 month ago - Pushed: 7 months ago - Stars: 534 - Forks: 101
VisualWebBench/VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Language: Python - Size: 2.95 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 9 - Forks: 0
Yushi-Hu/tifa
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Language: Python - Size: 6.07 MB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 107 - Forks: 5
basakbuluz/Visual-Question-Answering
:camera: :question: Visual Question Answering Demo and Algorithmia API
Language: Jupyter Notebook - Size: 53 MB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 26 - Forks: 6
peteanderson80/bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Language: Jupyter Notebook - Size: 13.4 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 1,401 - Forks: 378
Cyanogenoid/pytorch-vqa
Strong baseline for visual question answering
Language: Python - Size: 21.5 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 237 - Forks: 98
gutbash/lmm-graph-vision
How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?
Language: Python - Size: 159 MB - Last synced: 24 days ago - Pushed: about 2 months ago - Stars: 1 - Forks: 1
salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language: Jupyter Notebook - Size: 6.34 MB - Last synced: about 2 months ago - Pushed: 8 months ago - Stars: 4,189 - Forks: 558
XingruiWang/3D-Aware-VQA
Official Code for the NeurIPS'23 paper "3D-Aware Visual Question Answering about Parts, Poses and Occlusions"
Language: Jupyter Notebook - Size: 29.5 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 11 - Forks: 0
inferless/Moondream1
Moondream1 is a 1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVa training dataset.
Language: Python - Size: 24.4 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
inferless/Moondream2
Moondream2 is a small vision language model designed to run efficiently on edge devices.
Language: Python - Size: 27.3 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0
double125/Graph-Matching-Attention
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Language: Python - Size: 2.5 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 8 - Forks: 1
bowen-upenn/Multi-Agent-VQA
Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
Language: Python - Size: 9.97 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0
mapluisch/LLaVA-CLI-with-multiple-images
LLaVA inference with multiple images at once for cross-image analysis.
Language: Python - Size: 24 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 12 - Forks: 1
kkahatapitiya/LangRepo
Language Repository for Long Video Understanding
Language: Python - Size: 609 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 6 - Forks: 1
lucidrains/AoA-pytorch
A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering
Language: Python - Size: 39.1 KB - Last synced: 30 days ago - Pushed: over 3 years ago - Stars: 40 - Forks: 5
shikamaru-96/Visual-Question-Answering
Implementation of the visual question answering model from the paper "Exploring Models and Data for Image Question Answering".
Language: Python - Size: 8.1 MB - Last synced: 2 months ago - Pushed: about 6 years ago - Stars: 10 - Forks: 3
atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge
VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)
Language: Python - Size: 90.6 MB - Last synced: 3 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0
mliu-dark-knight/VQA
Language: Python - Size: 82 KB - Last synced: 3 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 0
badripatro/awesome-vqg
Visual Question Generation reading list
Size: 18.6 KB - Last synced: 3 days ago - Pushed: almost 4 years ago - Stars: 27 - Forks: 4
winnedatsch/tuw-master-thesis
This repository hosts the code for Jan Hadl's Master Thesis at TU Wien: GS-VQA, a zero-shot visual questions answering (VQA) pipeline that uses vision-language models (VLMs) for visual perception and answer-set programming (ASP) for symbolic reasoning.
Language: Jupyter Notebook - Size: 37.3 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0
YehLi/xmodaler
X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).
Language: Python - Size: 12.2 MB - Last synced: 3 months ago - Pushed: over 1 year ago - Stars: 996 - Forks: 124
ai-forever/fusion_brain_aij2021
Creating multimodal multitask models
Language: Jupyter Notebook - Size: 4.29 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 47 - Forks: 16
hackerchenzhuo/LaKo
[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
Language: Python - Size: 64.9 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 22 - Forks: 3
thatAverageGuy/EarlyFusion-on-EasyVQA
Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.
Language: Python - Size: 2.74 MB - Last synced: 4 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0
kHarshit/visual-question-answering
Visual Question Answering System using ViT, GPT, BERT (LLMs)
Language: Jupyter Notebook - Size: 3.62 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0
China-UK-ZSL/ZS-F-VQA
[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph
Language: Python - Size: 37.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 59 - Forks: 14
Cloud-CV/vilbert-multi-task
:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo
Language: Python - Size: 1.27 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 35 - Forks: 4
Axe--/Visual-Question-Answering
PyTorch Implementation of VQA Baseline & Hierarchical Co-Attention model
Language: Python - Size: 364 KB - Last synced: 3 months ago - Pushed: 8 months ago - Stars: 16 - Forks: 5
markdtw/vqa-winner-cvprw-2017
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17
Language: Python - Size: 25.4 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 165 - Forks: 38
vzhou842/easy-VQA
The Easy Visual Question Answering dataset.
Language: Python - Size: 9.5 MB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 32 - Forks: 11
ghazaleh-mahmoodi/lxmert_compression
B.Sc. Final Project: LXMERT Model Compression for Visual Question Answering.
Language: Python - Size: 11.6 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 2 - Forks: 1
anujanegi/VQA
Visual Question Answering System
Language: Python - Size: 181 MB - Last synced: 11 days ago - Pushed: over 4 years ago - Stars: 11 - Forks: 0
RachanaJayaram/Cross-Attention-VizWiz-VQA
A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.
Language: Python - Size: 2.55 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 12 - Forks: 6
sdc17/UPop
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.
Language: Python - Size: 2.4 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 54 - Forks: 3
ExplainableML/CLEVR-X
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Language: Python - Size: 2.27 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 21 - Forks: 1
ivonajdenkoska/multimodal-meta-learn
Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).
Language: Python - Size: 13.8 MB - Last synced: 7 months ago - Pushed: 12 months ago - Stars: 34 - Forks: 1
SKTBrain/KVQA
Korean Visual Question Answering
Size: 3.58 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 51 - Forks: 4
rentainhe/TRAR-VQA
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
Language: Python - Size: 927 KB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 54 - Forks: 17
JunweiLiang/FVTA_MemexQA
Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19
Language: Python - Size: 723 KB - Last synced: 7 months ago - Pushed: almost 5 years ago - Stars: 33 - Forks: 15
ITE-5th/visual-question-answering
Language: Python - Size: 621 KB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 1
ITE-5th/eya
Language: Python - Size: 16.9 MB - Last synced: 7 months ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 0
Glaciohound/VCML
PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019
Language: Python - Size: 2.43 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 47 - Forks: 7
badripatro/awesome-visual-dialog
Visual Dialog
Size: 16.6 KB - Last synced: about 2 months ago - Pushed: almost 4 years ago - Stars: 15 - Forks: 1
aioz-ai/ECCVW20_MILQT
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering (ECCVW 2020)
Language: Python - Size: 1.58 MB - Last synced: 2 days ago - Pushed: over 3 years ago - Stars: 5 - Forks: 0
sdc17/CrossGET
CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Size: 1.19 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 13 - Forks: 0
engindeniz/vitis
[ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts
Language: Python - Size: 3.91 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0
sepiatone/cv-question_answering
Visual Question Answering is a Computer Vision / Natural Language Processing task whose objective is to be able to provide a natural language answer to a natural language question about a given image
Language: Python - Size: 13.8 MB - Last synced: 8 months ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0
zero-nnkn/vision-assistant-services
👁🗨 Vision Assistant (Backend): Smart Assistant for Visually Impaired People
Language: Python - Size: 21.7 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 2 - Forks: 1
maj34/Eye-Handicapped-Service
[ X:AI Conference ] 시각장애인을 위한 안내見 서비스
Language: Jupyter Notebook - Size: 31.4 MB - Last synced: 10 days ago - Pushed: 9 months ago - Stars: 10 - Forks: 1
zhegan27/VILLA
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part
Language: Python - Size: 849 KB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 115 - Forks: 12
Dafterfly/Quick_Vilt
A CLI and GUI for using the Vision-and-Language Transformer (ViLT) model for visual question answering (answering questions based on an image)
Language: Python - Size: 15.6 KB - Last synced: 4 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0
suraj-maniyar/VQA-PyTorch
My implementation of VQA in PyTorch
Language: Python - Size: 83.7 MB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0
DelTA-Lab-IITK/PDUN
Probabilistic framework for solving Visual Dialog
Size: 21 MB - Last synced: 10 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0
juletx/egunean-behin-vqa
Egunean Behin Visual Question Answering Dataset
Language: Jupyter Notebook - Size: 12.9 MB - Last synced: 10 months ago - Pushed: about 2 years ago - Stars: 2 - Forks: 0
AliMostafaRadwan/VQA_real-time
visual question answering in real-time
Language: Python - Size: 8.79 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 2 - Forks: 1
noagarcia/ROLL-VideoQA
PyTorch code for ROLL, a knowledge-based video story question answering model.
Language: Python - Size: 522 KB - Last synced: 10 months ago - Pushed: over 3 years ago - Stars: 20 - Forks: 4
aligholami/hexia
Mid-level PyTorch Based Framework for Visual Question Answering.
Language: Python - Size: 20 MB - Last synced: 10 months ago - Pushed: 11 months ago - Stars: 24 - Forks: 2
kshitij98/cViL
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Language: Python - Size: 762 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0
antoyang/just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
Language: Jupyter Notebook - Size: 917 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 98 - Forks: 13
MunzerDw/DLVC-3DVQA
This is the official repository of our report 3D Visual Question Answering by Leonard Schenk and Munzer Dwedari for the course Deep Learning in Visual Computing.
Language: Python - Size: 18.6 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 6 - Forks: 0
avi-jit/RadiologyQA
MSR Cambridge Internship Summer 2023
Language: Vue - Size: 1.97 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0
itsmariodias/bert-mcoatt-vqa
BERT based Multiple Parallel Co-Attention Networks for Visual Question Answering
Language: Python - Size: 481 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0
Pegayus/vqa-npi
An extended implementation of Neural Program Interpreter (NPI) for Visual Question Answering (VQA) tasks, built on Python 3.5 and Tensorflow 1.12.0. Includes detailed instructions for replication and customization.
Language: Python - Size: 28.6 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0
yousefkotp/Visual-Question-Answering
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
Language: Jupyter Notebook - Size: 15.9 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 1
noagarcia/awesome-vqa-pytorch
List of PyTorch repositories for visual question answering
Size: 1.95 KB - Last synced: 10 months ago - Pushed: almost 5 years ago - Stars: 13 - Forks: 3
OmerShubi/DL_VQA
Visual Question Answering (VQA) Model
Language: Python - Size: 267 KB - Last synced: 12 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 1
guoyang9/vqa-prior
Implementation for our SIGIR 2019 paper --- Quantifying and Alleviating the Language Prior Problem in Visual Question Answering.
Language: Python - Size: 33.2 KB - Last synced: 12 months ago - Pushed: over 4 years ago - Stars: 6 - Forks: 3
abhinav-neil/socratic-models Fork of milenakapralova/socraticmodels
Socratic models for multimodal reasoning & image captioning
Language: Jupyter Notebook - Size: 48.8 MB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 0 - Forks: 0
UsefGamal/Visual-Question-Answering-VQA
A Multimodal project in which a vision model used to understand images concatenated with NLP model to understand questions in order to provide answers based on both questions and images
Language: Jupyter Notebook - Size: 2.19 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
ParadoxZW/prophet Fork of MILVLG/prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
Language: Python - Size: 1.07 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0
mesnico/RelationNetworks-CLEVR
A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset
Language: Python - Size: 3.68 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 83 - Forks: 26