Ecosyste.ms: Repos

An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: vqa

Repositories

aws-samples/visual-question-answering-finetuning

Finetuning Large Visual Models on Visual Question Answering

Language: Jupyter Notebook - Size: 99.6 KB - Last synced: 3 days ago - Pushed: 3 days ago - Stars: 0 - Forks: 0

facebookresearch/mmf

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Language: Python - Size: 17.1 MB - Last synced: 3 days ago - Pushed: 3 months ago - Stars: 5,424 - Forks: 922

stanfordnlp/mac-network

Implementation for the paper "Compositional Attention Networks for Machine Reasoning" (Hudson and Manning, ICLR 2018)

Language: Python - Size: 205 KB - Last synced: 4 days ago - Pushed: almost 3 years ago - Stars: 489 - Forks: 123

j-min/DSG

Davidsonian Scene Graph (DSG) for Text-to-Image Evaluation (ICLR 2024)

Language: Jupyter Notebook - Size: 4.4 MB - Last synced: 7 days ago - Pushed: 7 days ago - Stars: 58 - Forks: 3

AdrianBZG/llama-multimodal-vqa

Multimodal Instruction Tuning for Llama 3

Language: Python - Size: 31.3 KB - Last synced: 5 days ago - Pushed: 30 days ago - Stars: 10 - Forks: 2

jokieleung/awesome-visual-question-answering

A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.

Size: 179 KB - Last synced: 6 days ago - Pushed: 11 months ago - Stars: 644 - Forks: 95

open-compass/VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 40+ HF models, 20+ benchmarks

Language: Python - Size: 1.48 MB - Last synced: 15 days ago - Pushed: 15 days ago - Stars: 424 - Forks: 46

oliverc1623/ceriad

An extension of the Planner-Actor-Reporter framework applied to autonomous vehicles in Highway-Env and CARLA.

Language: Python - Size: 163 MB - Last synced: 17 days ago - Pushed: 17 days ago - Stars: 8 - Forks: 2

MILVLG/openvqa

A lightweight, scalable, and general framework for visual question answering research

Language: Python - Size: 833 KB - Last synced: 10 days ago - Pushed: over 2 years ago - Stars: 307 - Forks: 64

ai4streaming-workshop/ai4streaming-workshop.github.io

AIS: Vision, Graphics and AI for Streaming Workshop at CVPR 2024

Language: CSS - Size: 15.2 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 1 - Forks: 0

reshalfahsi/vqa-clip-lstm

Visual Question Answering Using CLIP + LSTM

Language: Jupyter Notebook - Size: 4.67 MB - Last synced: 20 days ago - Pushed: 20 days ago - Stars: 0 - Forks: 0

TheoCoombes/ClipCap

Using pretrained encoder and language models to generate captions from multimedia inputs.

Language: Python - Size: 92.7 MB - Last synced: 22 days ago - Pushed: about 1 year ago - Stars: 92 - Forks: 15

AIRI-Institute/OmniFusion

OmniFusion — a multimodal model to communicate using text and images

Language: Python - Size: 22.2 MB - Last synced: 27 days ago - Pushed: 27 days ago - Stars: 187 - Forks: 17

MILVLG/activitynet-qa

An VideoQA dataset based on the videos from ActivityNet

Language: Python - Size: 2.03 MB - Last synced: 10 days ago - Pushed: over 3 years ago - Stars: 54 - Forks: 9

dinhquy-nguyen-1704/Visual_Question_Answering

Visual Question Answering with Yes/No form on CoCo dataset

Language: Python - Size: 287 KB - Last synced: 28 days ago - Pushed: 4 months ago - Stars: 3 - Forks: 0

chingyaoc/awesome-vqa

Visual Q&A reading list

Size: 2.84 MB - Last synced: 6 days ago - Pushed: over 5 years ago - Stars: 431 - Forks: 94

rabiulcste/vqazero

visual question answering prompting recipes for large vision-language models

Language: Python - Size: 1.36 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 8 - Forks: 1

psnonis/TikiAI

TikiAI Natural Visual QA : UC Berkeley MIDS w251 Final Project

Language: Python - Size: 20.3 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 4 - Forks: 2

OpenGVLab/Multi-Modality-Arena

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Language: Python - Size: 21.5 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 360 - Forks: 23

SatyamGaba/visual_question_answering

Visual Question Answering in PyTorch with various Attention Models

Language: Jupyter Notebook - Size: 1.21 MB - Last synced: about 1 month ago - Pushed: about 4 years ago - Stars: 18 - Forks: 9

Cadene/vqa.pytorch

Visual Question Answering in Pytorch

Language: Python - Size: 1.73 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 699 - Forks: 178

VQA-Team/Visual-Question-Answering

The project is an Android application aimed to help the visually impaired by giving them the ability to take a picture, ask questions about it and the application will provide them with the answers using machine learning techniques and tools.

Language: Jupyter Notebook - Size: 17 MB - Last synced: 14 days ago - Pushed: almost 2 years ago - Stars: 3 - Forks: 4

lmelvix/visual-question-answering-tensorflow

Stacked attention network for answering open-ended questions about image

Language: Python - Size: 163 KB - Last synced: about 1 month ago - Pushed: almost 6 years ago - Stars: 12 - Forks: 7

findalexli/SciGraphQA

SciGraphQA

Language: Jupyter Notebook - Size: 16.7 MB - Last synced: 28 days ago - Pushed: 10 months ago - Stars: 32 - Forks: 2

davidmascharka/tbd-nets

PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Language: Jupyter Notebook - Size: 21.8 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 349 - Forks: 74

OpenQuantumComputing/QAOA

This package is a flexible python implementation of the Quantum Approximate Optimization Algorithm /Quantum Alternating Operator ansatz (QAOA) aimed at researchers to readily test the performance of a new ansatz, a new classical optimizers, etc.

Language: Python - Size: 12.4 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 1 - Forks: 2

chandrakanthm/visual-question-generator

Language: Python - Size: 1.36 MB - Last synced: about 1 month ago - Pushed: about 7 years ago - Stars: 1 - Forks: 0

pairlab/SlotFormer

Code release for ICLR 2023 paper: SlotFormer on object-centric dynamics models

Language: Python - Size: 19.6 MB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 91 - Forks: 19

OpenGVLab/InternGPT

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Language: Python - Size: 41.9 MB - Last synced: about 1 month ago - Pushed: 6 months ago - Stars: 3,107 - Forks: 224

basakbuluz/Visual-Question-Answering

:camera: :question: Visual Question Answering Demo and Algorithmia API

Language: Jupyter Notebook - Size: 53 MB - Last synced: about 1 month ago - Pushed: over 5 years ago - Stars: 26 - Forks: 6

peteanderson80/bottom-up-attention

Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

Language: Jupyter Notebook - Size: 13.4 MB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 1,401 - Forks: 378

lupantech/IconQA

Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".

Language: Python - Size: 3.57 MB - Last synced: 28 days ago - Pushed: 4 months ago - Stars: 42 - Forks: 13

hengyuan-hu/bottom-up-attention-vqa

An efficient PyTorch implementation of the winning entry of the 2017 VQA Challenge.

Language: Python - Size: 389 KB - Last synced: about 1 month ago - Pushed: 3 months ago - Stars: 745 - Forks: 181

Cyanogenoid/pytorch-vqa

Strong baseline for visual question answering

Language: Python - Size: 21.5 KB - Last synced: about 1 month ago - Pushed: about 1 year ago - Stars: 237 - Forks: 98

gutbash/lmm-graph-vision

How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?

Language: Python - Size: 159 MB - Last synced: 18 days ago - Pushed: about 2 months ago - Stars: 1 - Forks: 1

Wuziyi616/SlotDiffusion

Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models

Language: Python - Size: 6.12 MB - Last synced: about 1 month ago - Pushed: 4 months ago - Stars: 68 - Forks: 6

likenneth/mmgnn_textvqa

A Pytorch implementation of CVPR 2020 paper: Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Language: Python - Size: 9.59 MB - Last synced: 4 days ago - Pushed: about 1 year ago - Stars: 46 - Forks: 17

aioz-ai/CFR_VQA

Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)

Language: Python - Size: 273 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 38 - Forks: 6

csebuetnlp/IllusionVQA

This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"

Language: Jupyter Notebook - Size: 86.5 MB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 3 - Forks: 1

shure-dev/Awesome-LLM-related-Papers-Comprehensive-Topics

Awesome LLM-related papers and repos on very comprehensive topics.

Language: Jupyter Notebook - Size: 352 KB - Last synced: about 2 months ago - Pushed: about 2 months ago - Stars: 149 - Forks: 16

microsoft/Oscar

Oscar and VinVL

Language: Python - Size: 715 KB - Last synced: about 2 months ago - Pushed: 9 months ago - Stars: 1,024 - Forks: 248

aioz-ai/MICCAI19-MedVQA

AIOZ AI - Overcoming Data Limitation in Medical Visual Question Answering (MICCAI 2019)

Language: Python - Size: 137 KB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 43 - Forks: 25

XIRZC/rec2vqa

This repository is about Referring Expression Comprehension Based Visual Question Answering.

Language: Jupyter Notebook - Size: 171 MB - Last synced: about 1 month ago - Pushed: about 1 month ago - Stars: 0 - Forks: 0

jayleicn/ClipBERT

[CVPR 2021 Best Student Paper Honorable Mention, Oral] Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks.

Language: Python - Size: 73.2 KB - Last synced: about 2 months ago - Pushed: 10 months ago - Stars: 685 - Forks: 85

BDBC-KG-NLP/QA-Survey-CN

北京航空航天大学大数据高精尖中心自然语言处理研究团队开展了智能问答的研究与应用总结。包括基于知识图谱的问答（KBQA），基于文本的问答系统（TextQA），基于表格的问答系统（TableQA）、基于视觉的问答系统（VisualQA）和机器阅读理解（MRC）等，每类任务分别对学术界和工业界进行了相关总结。

Size: 23.9 MB - Last synced: about 2 months ago - Pushed: about 1 year ago - Stars: 1,592 - Forks: 258

mapluisch/LLaVA-CLI-with-multiple-images

LLaVA inference with multiple images at once for cross-image analysis.

Language: Python - Size: 24 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 12 - Forks: 1

lucidrains/AoA-pytorch

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

Language: Python - Size: 39.1 KB - Last synced: 24 days ago - Pushed: over 3 years ago - Stars: 40 - Forks: 5

nerdimite/neuro-symbolic-ai-soc

Neuro-Symbolic Visual Question Answering on Sort-of-CLEVR using PyTorch

Language: Jupyter Notebook - Size: 8.99 MB - Last synced: about 1 month ago - Pushed: over 2 years ago - Stars: 52 - Forks: 13

shikamaru-96/Visual-Question-Answering

Implementation of the visual question answering model from the paper "Exploring Models and Data for Image Question Answering".

Language: Python - Size: 8.1 MB - Last synced: 2 months ago - Pushed: about 6 years ago - Stars: 10 - Forks: 3

fulcus/3neurons-artificial-neural-networks-and-deep-learning

3 deep learning challenges, consisting of Image Classification, Image Segmentation, Visual Question Answering

Language: Jupyter Notebook - Size: 2.41 MB - Last synced: 2 months ago - Pushed: over 2 years ago - Stars: 0 - Forks: 0

MILVLG/rosita

ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration

Language: Python - Size: 15.9 MB - Last synced: 10 days ago - Pushed: 12 months ago - Stars: 55 - Forks: 13

radonys/CFB-VQA

VQA Challenge - hosted on Hasura using Flask

Language: Python - Size: 49.5 MB - Last synced: 2 months ago - Pushed: about 6 years ago - Stars: 2 - Forks: 0

pranavgupta2603/CLIP-ViL-GradCAM

An implemention of CLIP-ViL Gradcam for VQA tasks

Language: Jupyter Notebook - Size: 63.8 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 0 - Forks: 0

FuxiaoLiu/LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Language: Python - Size: 23.9 MB - Last synced: 2 months ago - Pushed: 2 months ago - Stars: 193 - Forks: 13

badripatro/awesome-vqg

Visual Question Generation reading list

Size: 18.6 KB - Last synced: 7 days ago - Pushed: over 3 years ago - Stars: 27 - Forks: 4

zchoi/SNLC

[PR23] The implementation of the paper ''Learning Visual Question Answering on Controlled Semantic Noisy Labels''

Language: Python - Size: 10.2 MB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 7 - Forks: 0

sutdcv/SUTD-TrafficQA

[CVPR2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Language: JavaScript - Size: 7.24 MB - Last synced: 3 months ago - Pushed: over 1 year ago - Stars: 41 - Forks: 2

aioz-ai/MICCAI21_MMQ

Multiple Meta-model Quantifying for Medical Visual Question Answering (MICCAI 2021)

Language: Python - Size: 279 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 33 - Forks: 9

NExTplusplus/TAT-DQA

TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning

Size: 1010 KB - Last synced: 3 months ago - Pushed: 3 months ago - Stars: 13 - Forks: 1

hila-chefer/Transformer-MM-Explainability

[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

Language: Jupyter Notebook - Size: 25.3 MB - Last synced: 3 months ago - Pushed: 9 months ago - Stars: 682 - Forks: 101

aioz-ai/ICCV19_VQA-CTI

Compact Trilinear Interaction for Visual Question Answering (ICCV 2019)

Language: Python - Size: 818 KB - Last synced: about 1 month ago - Pushed: over 1 year ago - Stars: 39 - Forks: 8

hackerchenzhuo/LaKo

[Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Language: Python - Size: 64.9 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 22 - Forks: 3

kushalkafle/DVQA_dataset

DVQA Dataset: A Bar chart question answering dataset presented at CVPR 2018

Size: 12.7 KB - Last synced: 3 months ago - Pushed: almost 5 years ago - Stars: 29 - Forks: 1

ap229997/Conditional-Batch-Norm

Pytorch implementation of NIPS 2017 paper "Modulating early visual processing by language"

Language: Python - Size: 40 KB - Last synced: about 2 months ago - Pushed: over 5 years ago - Stars: 59 - Forks: 11

avinabsaha/HIDRO-VQA

Official Implementation of WACV 2024 Paper "HIDRO-VQA : High Dynamic Range Oracle for Video Quality Assessment"

Language: Python - Size: 98.6 KB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 8 - Forks: 1

ab3llini/Transformer-VQA

Transformer-based VQA system capable of generating unconstrained, open-ended answers based on OpenAI's GPT-2 117M

Language: Python - Size: 24.8 MB - Last synced: 4 months ago - Pushed: about 2 years ago - Stars: 1 - Forks: 0

China-UK-ZSL/ZS-F-VQA

[Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph

Language: Python - Size: 37.3 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 59 - Forks: 14

NVlabs/prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language: Python - Size: 4.25 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 1,245 - Forks: 70

dinesh-kumar-mr/MediVQA

Part of our final year project work involving complex NLP tasks along with experimentation on various datasets and different LLMs

Language: HTML - Size: 1.98 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 0 - Forks: 0

Axe--/Visual-Question-Answering

PyTorch Implementation of VQA Baseline & Hierarchical Co-Attention model

Language: Python - Size: 364 KB - Last synced: 2 months ago - Pushed: 8 months ago - Stars: 16 - Forks: 5

CarolineGao/LoRA-Dataset

[NeurIPS2023] LoRA: A Logical Reasoning Augmented Dataset for Visual Question Answering

Language: Jupyter Notebook - Size: 12.4 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 3 - Forks: 0

abachaa/VQA-Med-2019

Visual Question Answering in the Medical Domain VQA-Med 2019

Size: 20 MB - Last synced: 4 months ago - Pushed: 4 months ago - Stars: 65 - Forks: 26

vzhou842/easy-VQA-demo

A Web-based Javascript Demo of an easy-VQA model.

Language: JavaScript - Size: 2.89 MB - Last synced: about 1 month ago - Pushed: over 4 years ago - Stars: 13 - Forks: 5

vzhou842/easy-VQA

The Easy Visual Question Answering dataset.

Language: Python - Size: 9.5 MB - Last synced: about 1 month ago - Pushed: 8 months ago - Stars: 32 - Forks: 11

vzhou842/easy-VQA-keras

A Keras implementation of VQA using the easy-VQA dataset.

Language: Python - Size: 42 KB - Last synced: about 1 month ago - Pushed: almost 4 years ago - Stars: 22 - Forks: 7

Themiscodes/Quantum-Neural-Networks

Implementations of quantum circuits used for the thesis Quantum Neural Networks with Qutrits

Language: Jupyter Notebook - Size: 16.8 MB - Last synced: 5 months ago - Pushed: 5 months ago - Stars: 1 - Forks: 0

YangLiu9208/CausalVLR

CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning

Size: 37.3 MB - Last synced: 4 months ago - Pushed: 9 months ago - Stars: 15 - Forks: 4

wildchaser1703/IntelliQuery

Deep Learning-Powered Visual & Textual Answering System

Language: Jupyter Notebook - Size: 187 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 0 - Forks: 0

Letian2003/C-VQA

Counterfactual Reasoning VQA Dataset

Language: Python - Size: 271 KB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 14 - Forks: 2

ghazaleh-mahmoodi/lxmert_compression

B.Sc. Final Project: LXMERT Model Compression for Visual Question Answering.

Language: Python - Size: 11.6 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 2 - Forks: 1

phiyodr/vqaloader

PyTorch DataLoader for many VQA datasets

Language: Python - Size: 37.1 KB - Last synced: 12 days ago - Pushed: over 1 year ago - Stars: 6 - Forks: 1

Allenpandas/BLIP-ImageCaptioning Fork of salesforce/BLIP

Folk BLIP ImageCaptioning from salesforce

Language: Python - Size: 8.13 MB - Last synced: 6 months ago - Pushed: 6 months ago - Stars: 1 - Forks: 0

Related Keywords

vqa 211 visual-question-answering 56 pytorch 56 deep-learning 55 vqa-dataset 26 computer-vision 25 question-answering 22 python 17 machine-learning 17 tensorflow 16 multimodal 14 nlp 13 vision-and-language 11 natural-language-processing 11 image-captioning 11 keras 11 attention 9 dataset 9 reasoning 8 clevr 8 visual-reasoning 7 llm 6 clip 6 lstm 6 neural-networks 6 chatgpt 6 attention-mechanism 6 videoqa 6 torch 5 vqg 5 cvpr 5 gqa 5 transformer 5 deep-neural-networks 5 ai 5 causal-inference 5 qa 4 transformers 4 multimodal-deep-learning 4 llava 4 large-language-models 4 explainable-ai 4 video-question-answering 4 video 4 vqav2 4 evaluation 4 visual-question-generation 4 video-quality-assessment 4 multi-modal 4 chatbot 4 pre-training 4 question-generation 4 benchmark 4 radiology 4 cvpr2018 3 aioz-ai 3 domain-adaptation 3 stacked-attention-networks 3 cnn 3 iqa 3 artificial-intelligence 3 aioz 3 python3 3 image-text-retrieval 3 image-classification 3 caffe 3 multimodal-learning 3 dialog 3 faster-rcnn 3 paraphrase-identification 3 compositional-attention-networks 3 machine-reasoning 3 counterfactual 3 classification 3 gpt-4 3 easy-vqa 3 vqa-med 3 llama 3 medical-imaging 3 knet 3 paraphrase-generation 3 causality 3 cvpr2021 3 gpt 3 visualization 3 iccv 3 vision-language 3 visual-questions-generation 3 llms 3 gnn 3 attention-model 3 prompt-engineering 3 medical 2 medical-image-processing 2 zero-shot 2 survey 2 julia 2 flask 2 smart-home 2 baseline 2