An open API service providing repository metadata for many open source software ecosystems.

GitHub topics: visual-reasoning

CSfufu/Revisual-R1

🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to achieve faithful, concise, and self-reflective state-of-the-art performance in visual and textual reasoning.

Language: Python - Size: 12.9 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 144 - Forks: 2

eric-ai-lab/GRIT

Official code for paper "GRIT: Teaching MLLMs to Think with Images"

Language: Python - Size: 4.96 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 92 - Forks: 2

MSR3D/MSR3D

[NeurIPS 2024] Official code repository for MSR3D paper

Language: Python - Size: 75.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 60 - Forks: 3

andrewliao11/LongPerceptualThoughts

The official implementation of "LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception"

Language: Python - Size: 3.27 MB - Last synced at: about 11 hours ago - Pushed at: 30 days ago - Stars: 4 - Forks: 2

LAMDASZ-ML/Awesome-LLM-Reasoning-with-NeSy

✨✨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models

Size: 1.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 104 - Forks: 5

yangjie-cv/WeThink

WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

Language: Python - Size: 1.58 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 0

hughplay/Visual-Reasoning-Papers

📄 A curated list of visual reasoning papers.

Language: TeX - Size: 3.09 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 26 - Forks: 2

raminguyen/LLMP2

Evaluating ‘Graphical Perception’ with Multimodal Large Language Models

Language: Jupyter Notebook - Size: 508 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

salesforce/BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5,265 - Forks: 688

NVlabs/Bongard-HOI

[CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning

Language: Python - Size: 4.49 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 67 - Forks: 7

NVlabs/RelViT

[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

Language: Python - Size: 259 KB - Last synced at: 13 days ago - Pushed at: almost 3 years ago - Stars: 63 - Forks: 3

keshik6/HourVideo

[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding

Language: Jupyter Notebook - Size: 8.16 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 67 - Forks: 3

hughplay/TVR

:boom: Transformation Driven Visual Reasoning - CVPR 2021

Language: Python - Size: 4.87 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 6

sdc17/CrossGET

[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.

Language: Python - Size: 11.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 26 - Forks: 0

shijx12/XNM-Net

Pytorch implementation of "Explainable and Explicit Visual Reasoning over Scene Graphs "

Language: Python - Size: 14.4 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 94 - Forks: 19

aelnouby/Relational-Networks

Pytorch implementation of " A simple neural network module for relational reasoning" paper aka Relational networks for visual reasoning.

Language: Python - Size: 31.3 KB - Last synced at: 6 days ago - Pushed at: about 7 years ago - Stars: 9 - Forks: 0

floodsung/Deep-Reasoning-Papers

Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning

Size: 1.32 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 293 - Forks: 34

MILVLG/mcan-vqa

Deep Modular Co-Attention Networks for Visual Question Answering

Language: Python - Size: 1.84 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 432 - Forks: 88

csbobby/STAR_Benchmark

Language: Python - Size: 1.24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 2

jaleedkhan/neusire

NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment

Language: Jupyter Notebook - Size: 46.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 3

WellyZhang/RAVEN

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Language: Python - Size: 102 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 135 - Forks: 26

wentaoheunnc/HCV-ARR

[AAAI 2023] Hierarchical ConViT with Attention-based Relational Reasoner for Visual Analogical Reasoning

Language: Python - Size: 5.58 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ethanjperez/film Fork of facebookresearch/clevr-iep

FiLM: Visual Reasoning with a General Conditioning Layer

Language: Python - Size: 2.48 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 281 - Forks: 55

aligholami/hexia

Mid-level PyTorch Based Framework for Visual Question Answering.

Language: Python - Size: 20 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 2

catalina17/VideoNavQA

An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)

Language: Python - Size: 5.17 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 23 - Forks: 1

cobanov/image-captioning

Image captioning using python and BLIP

Language: Python - Size: 28.2 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 3

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

rs9000/VisualReasoning_MMnet

Visual reasoning modular memory network

Language: Python - Size: 9.33 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

alexmirrington/gat-vqa

Source code for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"

Language: Python - Size: 474 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

Sina-Baharlou/VisualGenome-to-Depth

Convert RGB images of Visual-Genome dataset to Depth Maps.

Language: Python - Size: 236 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

markvasin/openvqa Fork of MILVLG/openvqa

Implementation of the VQA model from my MSc project

Language: Python - Size: 1.97 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

WellyZhang/ALANS

Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

Language: Python - Size: 2.59 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

WellyZhang/ACRE

ACRE: Abstract Causal REasoning Beyond Covariation

Language: Python - Size: 2.52 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

WellyZhang/PrAE

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Language: Python - Size: 44.9 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 2

WellyZhang/CoPINet

Learning Perceptual Inference by Contrasting

Language: Python - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 25 - Forks: 3

markvasin/MSc-Project

Multimodal Learning and Reasoning for Visual Question Answering

Language: TeX - Size: 42.1 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

alexmirrington/honours-thesis

LaTeX files for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"

Language: TeX - Size: 14.1 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

jaehyunnn/RelationalNetwork_pytorch

An un-official implementation of Relational Network [A. Santoro et al., 2017] (PyTorch)

Language: Python - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

markvasin/nscl_reproducability_challenge Fork of COMP6248-Reproducability-Challenge/nscl_reproducability_challenge

Reproducibility Challenge - The Neuro-Symbolic Concept Learner

Language: Jupyter Notebook - Size: 15.1 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0

Related Keywords
visual-reasoning 39 visual-question-answering 10 deep-learning 8 pytorch 8 vqa 8 scene-graph 5 abstract-reasoning 5 reasoning 5 image-captioning 4 reinforcement-learning 4 image-text-retrieval 4 clevr 4 ravens-progressive-matrices 4 vision-language-transformer 3 knowledge-graph 3 cvpr2021 3 computer-vision 3 scene-graph-generation 2 awesome 2 vision-language 2 visual-genome 2 machine-learning 2 neuro-symbolic 2 vision-and-language-pre-training 2 planning 2 neural-symbolic-reasoning 2 compositional-attention-networks 2 gqa 2 large-language-models 2 graph-attention-networks 2 multimodal-large-language-models 2 mllm 2 embodied-ai 2 navigation 2 visual-grounding 2 multimodal-reasoning 2 physical-reasoning 2 cvpr2019 2 question-answering 2 neuro-symbolic-ai 2 attention-mechanism 1 bachelor-project 1 convolutional-neural-networks 1 benchmark 1 conditioning 1 cross-modality 1 deep-neural-networks 1 natural-language-processing 1 aaai2023 1 visual-understanding 1 scene-graph-to-text 1 scene-graph-to-image 1 scene-graph-enrichment 1 ms-coco 1 knowledge-enrichment 1 image-representation 1 image-generation 1 commonsense-knowledge 1 dataset 1 attention 1 hico-det 1 relationalnetwork-pytorch 1 relational-reasoning 1 neurips-2019 1 causal-discovery 1 eccv2022 1 visual-relationship-detection 1 vg-depth 1 tensorflow 1 scene-graph-classification 1 rgb-to-depth 1 predicate-classification 1 depth-maps 1 modular-networks 1 visual-storytelling 1 visual-dialog 1 visual-commonsense-reasoning 1 vision-and-language-navigation 1 vision-and-language 1 story-visualization 1 multimodal-retrieval 1 multimodal-deep-learning 1 multi-task-learning 1 knowledge-enhanced-vision-language 1 knowledge-enhanced-multimodal-learning 1 image-text-matching 1 conditional-image-generation 1 img2text 1 blip 1 videonavqa 1 video 1 multimodal 1 embodied 1 few-shot-learning 1 cvpr2022 1 multimodel-large-language-model 1 graphical-perception 1 chart-intepretation 1 paper-list 1 visual-language-models 1