Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: visual-question-answering

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language: Python - Size: 3.34 MB - Last synced: about 4 hours ago - Pushed: about 8 hours ago - Stars: 273 - Forks: 19

aioz-ai/MICCAI19-MedVQA

AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)

Language: Python - Size: 137 KB - Last synced: 2 days ago - Pushed: 8 months ago - Stars: 43 - Forks: 26

miguelscarv/pheye

Pheye - a family of efficient small vision-language models

Language: Python - Size: 3.85 MB - Last synced: 1 day ago - Pushed: 2 days ago - Stars: 0 - Forks: 0

HanXinzi-AI/awesome-computer-vision-resources

a collection of computer vision projects&tools. 计算机视觉方向项目和工具集合。

Size: 49.8 MB - Last synced: 9 days ago - Pushed: 11 days ago - Stars: 101 - Forks: 21

aioz-ai/ICCV19_VQA-CTI

Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)

Language: Python - Size: 818 KB - Last synced: 2 days ago - Pushed: over 1 year ago - Stars: 38 - Forks: 8

qiantianwen/NuScenes-QA

[AAAI 2024] NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario.

Size: 1.55 MB - Last synced: 10 days ago - Pushed: 6 months ago - Stars: 129 - Forks: 0

MileBench/MileBench

This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"

Language: Python - Size: 3.51 MB - Last synced: 9 days ago - Pushed: 13 days ago - Stars: 15 - Forks: 1

AdrianBZG/llama-multimodal-vqa

Multimodal Instruction Tuning for Llama 3

Language: Python - Size: 31.3 KB - Last synced: 11 days ago - Pushed: about 1 month ago - Stars: 10 - Forks: 2

lupantech/MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Language: Jupyter Notebook - Size: 50 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 182 - Forks: 24

TIGER-AI-Lab/VIEScore

Towards Explainable Metrics for Conditional Image Synthesis Evaluation (ACL 2024)

Language: Python - Size: 7.61 MB - Last synced: 13 days ago - Pushed: 14 days ago - Stars: 15 - Forks: 0

richard-peng-xia/awesome-multimodal-in-medical-imaging

A collection of resources on applications of multi-modal learning in medical imaging.

Size: 273 KB - Last synced: 18 days ago - Pushed: 26 days ago - Stars: 332 - Forks: 33

kHarshit/llm-projects

LLM projects

Language: Jupyter Notebook - Size: 4.69 MB - Last synced: 14 days ago - Pushed: 15 days ago - Stars: 0 - Forks: 0

zjukg/KG-MM-Survey

Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey

Size: 82.2 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 204 - Forks: 13

hari-huynh/viVQA-voice-assistant

Voice assistant using Multimodal LLMs - LLaVA-NeXT (Mistral 7B) finetuned & PhoWhisper

Language: Python - Size: 5.96 MB - Last synced: 16 days ago - Pushed: 17 days ago - Stars: 3 - Forks: 3

MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language: Python - Size: 1.09 MB - Last synced: 16 days ago - Pushed: about 1 year ago - Stars: 259 - Forks: 27

lucidrains/flamingo-pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Language: Python - Size: 212 KB - Last synced: 17 days ago - Pushed: over 1 year ago - Stars: 1,154 - Forks: 58

OFA-Sys/OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Language: Python - Size: 120 MB - Last synced: 18 days ago - Pushed: about 1 month ago - Stars: 2,336 - Forks: 245

MILVLG/mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering

Language: Python - Size: 1.84 MB - Last synced: 16 days ago - Pushed: over 3 years ago - Stars: 432 - Forks: 88

allenai/aokvqa

Official repository for the A-OKVQA dataset

Language: Python - Size: 2.62 MB - Last synced: 23 days ago - Pushed: 23 days ago - Stars: 47 - Forks: 5

jacobmarks/vqa-plugin

Perform visual question answering on your images

Language: Python - Size: 34.2 KB - Last synced: 24 days ago - Pushed: 24 days ago - Stars: 11 - Forks: 2

MILVLG/openvqa

A lightweight, scalable, and general framework for visual question answering research

Language: Python - Size: 833 KB - Last synced: 16 days ago - Pushed: over 2 years ago - Stars: 307 - Forks: 64

reshalfahsi/vqa-clip-lstm

Visual Question Answering Using CLIP + LSTM

Language: Jupyter Notebook - Size: 4.67 MB - Last synced: 26 days ago - Pushed: 26 days ago - Stars: 0 - Forks: 0

ellenzhuwang/implicitOOD

An end-to-end vision and language model incorporating explicit knowledge graphs and OOD-detection.

Language: Python - Size: 175 KB - Last synced: 29 days ago - Pushed: 29 days ago - Stars: 2 - Forks: 0

Toloka/WSDMCup2023

Toloka Visual Question Answering Challenge at WSDM Cup 2023

Language: Jupyter Notebook - Size: 5.24 MB - Last synced: 29 days ago - Pushed: about 1 month ago - Stars: 29 - Forks: 7

mlvlab/Flipped-VQA

Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)

Language: Python - Size: 1.23 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 54 - Forks: 7

mlvlab/OVQA

Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 2023)

Language: Python - Size: 619 KB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 13 - Forks: 0

davidmascharka/tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Language: Jupyter Notebook - Size: 21.8 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 349 - Forks: 74

MMStar-Benchmark/MMStar

This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

Language: Python - Size: 3.41 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 84 - Forks: 1

jnhwkim/ban-vqa 📦

Bilinear attention networks for visual question answering

Language: Python - Size: 1.21 MB - Last synced: about 1 month ago - Pushed: 7 months ago - Stars: 534 - Forks: 101

VisualWebBench/VisualWebBench

Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"

Language: Python - Size: 2.95 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 9 - Forks: 0

Yushi-Hu/tifa

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Language: Python - Size: 6.07 MB - Last synced: 2 months ago - Pushed: 5 months ago - Stars: 107 - Forks: 5

basakbuluz/Visual-Question-Answering

:camera: :question: Visual Question Answering Demo and Algorithmia API

Language: Jupyter Notebook - Size: 53 MB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 26 - Forks: 6

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Language: Jupyter Notebook - Size: 13.4 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 1,401 - Forks: 378

Cyanogenoid/pytorch-vqa

Strong baseline for visual question answering

Language: Python - Size: 21.5 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 237 - Forks: 98

gutbash/lmm-graph-vision

How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?

Language: Python - Size: 159 MB - Last synced: 24 days ago - Pushed: about 2 months ago - Stars: 1 - Forks: 1

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced: about 2 months ago - Pushed: 8 months ago - Stars: 4,189 - Forks: 558

XingruiWang/3D-Aware-VQA

Official Code for the NeurIPS'23 paper "3D-Aware Visual Question Answering about Parts, Poses and Occlusions"

Language: Jupyter Notebook - Size: 29.5 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 11 - Forks: 0

inferless/Moondream1

Moondream1 is a 1.6B parameter model built using SigLIP, Phi-1.5 and the LLaVa training dataset.

Language: Python - Size: 24.4 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

inferless/Moondream2

Moondream2 is a small vision language model designed to run efficiently on edge devices.

Language: Python - Size: 27.3 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

double125/Graph-Matching-Attention

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

Language: Python - Size: 2.5 MB - Last synced: 2 months ago - Pushed: over 1 year ago - Stars: 8 - Forks: 1

bowen-upenn/Multi-Agent-VQA

Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering

Language: Python - Size: 9.97 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 0 - Forks: 0

mapluisch/LLaVA-CLI-with-multiple-images

LLaVA inference with multiple images at once for cross-image analysis.

Language: Python - Size: 24 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 12 - Forks: 1

kkahatapitiya/LangRepo

Language Repository for Long Video Understanding

Language: Python - Size: 609 KB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 6 - Forks: 1

lucidrains/AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

Language: Python - Size: 39.1 KB - Last synced: 30 days ago - Pushed: over 3 years ago - Stars: 40 - Forks: 5

shikamaru-96/Visual-Question-Answering

Implementation of the visual question answering model from the paper "Exploring Models and Data for Image Question Answering".

Language: Python - Size: 8.1 MB - Last synced: 2 months ago - Pushed: about 6 years ago - Stars: 10 - Forks: 3

atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge

VizWiz Challenge Term Project for Multi Modal Machine Learning @ CMU (11777)

Language: Python - Size: 90.6 MB - Last synced: 3 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

mliu-dark-knight/VQA

Language: Python - Size: 82 KB - Last synced: 3 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 0

badripatro/awesome-vqg

Visual Question Generation reading list

Size: 18.6 KB - Last synced: 3 days ago - Pushed: almost 4 years ago - Stars: 27 - Forks: 4

winnedatsch/tuw-master-thesis

This repository hosts the code for Jan Hadl's Master Thesis at TU Wien: GS-VQA, a zero-shot visual questions answering (VQA) pipeline that uses vision-language models (VLMs) for visual perception and answer-set programming (ASP) for symbolic reasoning.

Language: Jupyter Notebook - Size: 37.3 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 0 - Forks: 0

YehLi/xmodaler

X-modaler is a versatile and high-performance codebase for cross-modal analytics(e.g., image captioning, video captioning, vision-language pre-training, visual question answering, visual commonsense reasoning, and cross-modal retrieval).

Language: Python - Size: 12.2 MB - Last synced: 3 months ago - Pushed: over 1 year ago - Stars: 996 - Forks: 124

ai-forever/fusion_brain_aij2021

Creating multimodal multitask models

Language: Jupyter Notebook - Size: 4.29 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 47 - Forks: 16

hackerchenzhuo/LaKo

[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Language: Python - Size: 64.9 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 22 - Forks: 3

thatAverageGuy/EarlyFusion-on-EasyVQA

Streamlit app for demonstrating multi-modal(vision+language) modelling in Pytorch.

Language: Python - Size: 2.74 MB - Last synced: 4 months ago - Pushed: almost 2 years ago - Stars: 1 - Forks: 0

kHarshit/visual-question-answering

Visual Question Answering System using ViT, GPT, BERT (LLMs)

Language: Jupyter Notebook - Size: 3.62 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

China-UK-ZSL/ZS-F-VQA

[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph

Language: Python - Size: 37.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 59 - Forks: 14

Cloud-CV/vilbert-multi-task

:eyes: :speaking_head: :memo:12-in-1: Multi-Task Vision and Language Representation Learning Web Demo

Language: Python - Size: 1.27 MB - Last synced: about 2 months ago - Pushed: over 1 year ago - Stars: 35 - Forks: 4

Axe--/Visual-Question-Answering

PyTorch Implementation of VQA Baseline & Hierarchical Co-Attention model

Language: Python - Size: 364 KB - Last synced: 3 months ago - Pushed: 8 months ago - Stars: 16 - Forks: 5

markdtw/vqa-winner-cvprw-2017

Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

Language: Python - Size: 25.4 KB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 165 - Forks: 38

vzhou842/easy-VQA

The Easy Visual Question Answering dataset.

Language: Python - Size: 9.5 MB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 32 - Forks: 11

ghazaleh-mahmoodi/lxmert_compression

B.Sc. Final Project: LXMERT Model Compression for Visual Question Answering.

Language: Python - Size: 11.6 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 2 - Forks: 1

anujanegi/VQA

Visual Question Answering System

Language: Python - Size: 181 MB - Last synced: 11 days ago - Pushed: over 4 years ago - Stars: 11 - Forks: 0

RachanaJayaram/Cross-Attention-VizWiz-VQA

A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset originates from images and questions compiled by members of the visually impaired community and as such, highlights some of the challenges presented by this particular use case.

Language: Python - Size: 2.55 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 12 - Forks: 6

sdc17/UPop

[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.

Language: Python - Size: 2.4 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 54 - Forks: 3

ExplainableML/CLEVR-X

CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

Language: Python - Size: 2.27 MB - Last synced: 7 months ago - Pushed: 7 months ago - Stars: 21 - Forks: 1

ivonajdenkoska/multimodal-meta-learn

Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning" (published at ICLR 2023).

Language: Python - Size: 13.8 MB - Last synced: 7 months ago - Pushed: 12 months ago - Stars: 34 - Forks: 1

SKTBrain/KVQA

Korean Visual Question Answering

Size: 3.58 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 51 - Forks: 4

rentainhe/TRAR-VQA

[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"

Language: Python - Size: 927 KB - Last synced: 7 months ago - Pushed: over 2 years ago - Stars: 54 - Forks: 17

JunweiLiang/FVTA_MemexQA

Real-world photo sequence question answering system (MemexQA). CVPR'18 and TPAMI'19

Language: Python - Size: 723 KB - Last synced: 7 months ago - Pushed: almost 5 years ago - Stars: 33 - Forks: 15

ITE-5th/visual-question-answering

Language: Python - Size: 621 KB - Last synced: 7 months ago - Pushed: over 6 years ago - Stars: 1 - Forks: 1

ITE-5th/eya

Language: Python - Size: 16.9 MB - Last synced: 7 months ago - Pushed: almost 6 years ago - Stars: 0 - Forks: 0

Glaciohound/VCML

PyTorch implementation of paper "Visual Concept-Metaconcept Learner", NeruIPS 2019

Language: Python - Size: 2.43 MB - Last synced: 7 months ago - Pushed: over 4 years ago - Stars: 47 - Forks: 7

badripatro/awesome-visual-dialog

Visual Dialog

Size: 16.6 KB - Last synced: about 2 months ago - Pushed: almost 4 years ago - Stars: 15 - Forks: 1

aioz-ai/ECCVW20_MILQT

Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering (ECCVW 2020)

Language: Python - Size: 1.58 MB - Last synced: 2 days ago - Pushed: over 3 years ago - Stars: 5 - Forks: 0

sdc17/CrossGET

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

Size: 1.19 MB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 13 - Forks: 0

engindeniz/vitis

[ICCV 2023 CLVL Workshop] Zero-Shot and Few-Shot Video Question Answering with Multi-Modal Prompts

Language: Python - Size: 3.91 KB - Last synced: 8 months ago - Pushed: 8 months ago - Stars: 0 - Forks: 0

sepiatone/cv-question_answering

Visual Question Answering is a Computer Vision / Natural Language Processing task whose objective is to be able to provide a natural language answer to a natural language question about a given image

Language: Python - Size: 13.8 MB - Last synced: 8 months ago - Pushed: about 4 years ago - Stars: 1 - Forks: 0

zero-nnkn/vision-assistant-services

👁‍🗨 Vision Assistant (Backend): Smart Assistant for Visually Impaired People

Language: Python - Size: 21.7 MB - Last synced: 25 days ago - Pushed: 25 days ago - Stars: 2 - Forks: 1

maj34/Eye-Handicapped-Service

[ X:AI Conference ] 시각장애인을 위한 안내見 서비스

Language: Jupyter Notebook - Size: 31.4 MB - Last synced: 10 days ago - Pushed: 9 months ago - Stars: 10 - Forks: 1

zhegan27/VILLA

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

Language: Python - Size: 849 KB - Last synced: 7 months ago - Pushed: over 3 years ago - Stars: 115 - Forks: 12

Dafterfly/Quick_Vilt

A CLI and GUI for using the Vision-and-Language Transformer (ViLT) model for visual question answering (answering questions based on an image)

Language: Python - Size: 15.6 KB - Last synced: 4 months ago - Pushed: 9 months ago - Stars: 0 - Forks: 0

suraj-maniyar/VQA-PyTorch

My implementation of VQA in PyTorch

Language: Python - Size: 83.7 MB - Last synced: 9 months ago - Pushed: almost 4 years ago - Stars: 1 - Forks: 0

DelTA-Lab-IITK/PDUN

Probabilistic framework for solving Visual Dialog

Size: 21 MB - Last synced: 10 months ago - Pushed: over 4 years ago - Stars: 1 - Forks: 0

juletx/egunean-behin-vqa

Egunean Behin Visual Question Answering Dataset

Language: Jupyter Notebook - Size: 12.9 MB - Last synced: 10 months ago - Pushed: about 2 years ago - Stars: 2 - Forks: 0

AliMostafaRadwan/VQA_real-time

visual question answering in real-time

Language: Python - Size: 8.79 KB - Last synced: 10 months ago - Pushed: 10 months ago - Stars: 2 - Forks: 1

noagarcia/ROLL-VideoQA

PyTorch code for ROLL, a knowledge-based video story question answering model.

Language: Python - Size: 522 KB - Last synced: 10 months ago - Pushed: over 3 years ago - Stars: 20 - Forks: 4

aligholami/hexia

Mid-level PyTorch Based Framework for Visual Question Answering.

Language: Python - Size: 20 MB - Last synced: 10 months ago - Pushed: 11 months ago - Stars: 24 - Forks: 2

kshitij98/cViL

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

Language: Python - Size: 762 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 0 - Forks: 0

antoyang/just-ask

[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Language: Jupyter Notebook - Size: 917 KB - Last synced: 10 months ago - Pushed: over 1 year ago - Stars: 98 - Forks: 13

MunzerDw/DLVC-3DVQA

This is the official repository of our report 3D Visual Question Answering by Leonard Schenk and Munzer Dwedari for the course Deep Learning in Visual Computing.

Language: Python - Size: 18.6 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 6 - Forks: 0

avi-jit/RadiologyQA

MSR Cambridge Internship Summer 2023

Language: Vue - Size: 1.97 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 1 - Forks: 0

itsmariodias/bert-mcoatt-vqa

BERT based Multiple Parallel Co-Attention Networks for Visual Question Answering

Language: Python - Size: 481 KB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 0 - Forks: 0

Pegayus/vqa-npi

An extended implementation of Neural Program Interpreter (NPI) for Visual Question Answering (VQA) tasks, built on Python 3.5 and Tensorflow 1.12.0. Includes detailed instructions for replication and customization.

Language: Python - Size: 28.6 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 0

yousefkotp/Visual-Question-Answering

A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder

Language: Jupyter Notebook - Size: 15.9 MB - Last synced: 11 months ago - Pushed: 11 months ago - Stars: 0 - Forks: 1

noagarcia/awesome-vqa-pytorch

List of PyTorch repositories for visual question answering

Size: 1.95 KB - Last synced: 10 months ago - Pushed: almost 5 years ago - Stars: 13 - Forks: 3

OmerShubi/DL_VQA

Visual Question Answering (VQA) Model

Language: Python - Size: 267 KB - Last synced: 12 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 1

guoyang9/vqa-prior

Implementation for our SIGIR 2019 paper --- Quantifying and Alleviating the Language Prior Problem in Visual Question Answering.

Language: Python - Size: 33.2 KB - Last synced: 12 months ago - Pushed: over 4 years ago - Stars: 6 - Forks: 3

abhinav-neil/socratic-models Fork of milenakapralova/socraticmodels

Socratic models for multimodal reasoning & image captioning

Language: Jupyter Notebook - Size: 48.8 MB - Last synced: 12 months ago - Pushed: 12 months ago - Stars: 0 - Forks: 0

UsefGamal/Visual-Question-Answering-VQA

A Multimodal project in which a vision model used to understand images concatenated with NLP model to understand questions in order to provide answers based on both questions and images

Language: Jupyter Notebook - Size: 2.19 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

ParadoxZW/prophet Fork of MILVLG/prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

Language: Python - Size: 1.07 MB - Last synced: about 1 year ago - Pushed: about 1 year ago - Stars: 0 - Forks: 0

mesnico/RelationNetworks-CLEVR

A pytorch implementation for "A simple neural network module for relational reasoning", working on the CLEVR dataset

Language: Python - Size: 3.68 MB - Last synced: about 1 year ago - Pushed: over 4 years ago - Stars: 83 - Forks: 26

Related Keywords
visual-question-answering 162 vqa 56 deep-learning 46 pytorch 31 computer-vision 23 machine-learning 18 natural-language-processing 16 large-language-models 15 image-captioning 15 multimodal-learning 13 question-answering 13 vision-and-language 12 visual-reasoning 11 python 10 vqa-dataset 10 multimodal-deep-learning 10 neural-networks 8 vision-language 8 nlp 8 attention-mechanism 8 multimodal 8 attention 7 video-question-answering 7 tensorflow 6 image-classification 6 visual-question-generation 6 deep-neural-networks 6 lstm 6 transformers 5 visual-questions-generation 5 vision-language-transformer 5 keras 5 llm 5 large-multimodal-models 5 python3 5 knowledge-graph 5 multi-modal 4 dataset 4 bert 4 evaluation 4 transformer 4 foundation-models 4 vqav2 4 llms 4 image-text-retrieval 4 convolutional-neural-networks 4 multimodal-large-language-models 4 vizwiz-vqa 4 image-segmentation 4 medical-imaging 4 image-processing 4 video-understanding 4 visual-dialog 4 ai 4 visualization 3 few-shot-learning 3 domain-adaptation 3 cnn 3 prompt-engineering 3 clevr 3 question-generation 3 llava 3 clip 3 aioz 3 multi-modal-learning 3 videoqa 3 radiology 3 text-to-speech 3 pretraining 3 gpt-3 3 artificial-intelligence 3 aioz-ai 3 object-detection 3 scene-graph 3 multimodality 3 gru 3 faster-rcnn 3 vqg 3 stacked-attention-networks 3 emnlp2017 2 vision-and-language-pre-training 2 emnlp2018 2 graph-attention-networks 2 gqa 2 keras-tensorflow 2 acl 2 compositional-attention-networks 2 questions-and-answers 2 torch 2 vizwiz 2 relation-network 2 large-vision-language-models 2 image-encoding 2 streamlit 2 emnlp 2 text-encoding 2 caffe 2 fine-tuning 2 image-text 2 python-3 2