GitHub topics: visual-reasoning
CSfufu/Revisual-R1
🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement learning, and text-only reinforcement learning—to achieve faithful, concise, and self-reflective state-of-the-art performance in visual and textual reasoning.
Language: Python - Size: 12.9 MB - Last synced at: about 9 hours ago - Pushed at: about 10 hours ago - Stars: 144 - Forks: 2

eric-ai-lab/GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
Language: Python - Size: 4.96 MB - Last synced at: 1 day ago - Pushed at: 1 day ago - Stars: 92 - Forks: 2

MSR3D/MSR3D
[NeurIPS 2024] Official code repository for MSR3D paper
Language: Python - Size: 75.7 MB - Last synced at: 7 days ago - Pushed at: 7 days ago - Stars: 60 - Forks: 3

andrewliao11/LongPerceptualThoughts
The official implementation of "LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception"
Language: Python - Size: 3.27 MB - Last synced at: about 11 hours ago - Pushed at: 30 days ago - Stars: 4 - Forks: 2

LAMDASZ-ML/Awesome-LLM-Reasoning-with-NeSy
✨✨Latest Advances on Neuro-Symbolic Learning in the era of Large Language Models
Size: 1.3 MB - Last synced at: 8 days ago - Pushed at: 8 days ago - Stars: 104 - Forks: 5

yangjie-cv/WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
Language: Python - Size: 1.58 MB - Last synced at: 15 days ago - Pushed at: 15 days ago - Stars: 3 - Forks: 0

hughplay/Visual-Reasoning-Papers
📄 A curated list of visual reasoning papers.
Language: TeX - Size: 3.09 MB - Last synced at: 20 days ago - Pushed at: 20 days ago - Stars: 26 - Forks: 2

raminguyen/LLMP2
Evaluating ‘Graphical Perception’ with Multimodal Large Language Models
Language: Jupyter Notebook - Size: 508 MB - Last synced at: about 1 month ago - Pushed at: about 1 month ago - Stars: 2 - Forks: 0

salesforce/BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Language: Jupyter Notebook - Size: 6.34 MB - Last synced at: about 1 month ago - Pushed at: 11 months ago - Stars: 5,265 - Forks: 688

NVlabs/Bongard-HOI
[CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning
Language: Python - Size: 4.49 MB - Last synced at: 13 days ago - Pushed at: over 2 years ago - Stars: 67 - Forks: 7

NVlabs/RelViT
[ICLR 2022] RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Language: Python - Size: 259 KB - Last synced at: 13 days ago - Pushed at: almost 3 years ago - Stars: 63 - Forks: 3

keshik6/HourVideo
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
Language: Jupyter Notebook - Size: 8.16 MB - Last synced at: 4 months ago - Pushed at: 4 months ago - Stars: 67 - Forks: 3

hughplay/TVR
:boom: Transformation Driven Visual Reasoning - CVPR 2021
Language: Python - Size: 4.87 MB - Last synced at: about 1 month ago - Pushed at: about 2 years ago - Stars: 37 - Forks: 6

sdc17/CrossGET
[ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers.
Language: Python - Size: 11.6 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 26 - Forks: 0

shijx12/XNM-Net
Pytorch implementation of "Explainable and Explicit Visual Reasoning over Scene Graphs "
Language: Python - Size: 14.4 MB - Last synced at: 3 months ago - Pushed at: over 6 years ago - Stars: 94 - Forks: 19

aelnouby/Relational-Networks
Pytorch implementation of " A simple neural network module for relational reasoning" paper aka Relational networks for visual reasoning.
Language: Python - Size: 31.3 KB - Last synced at: 6 days ago - Pushed at: about 7 years ago - Stars: 9 - Forks: 0

floodsung/Deep-Reasoning-Papers
Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning
Size: 1.32 MB - Last synced at: 11 months ago - Pushed at: about 3 years ago - Stars: 293 - Forks: 34

MILVLG/mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
Language: Python - Size: 1.84 MB - Last synced at: about 1 year ago - Pushed at: over 4 years ago - Stars: 432 - Forks: 88

csbobby/STAR_Benchmark
Language: Python - Size: 1.24 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 23 - Forks: 2

jaleedkhan/neusire
NeuSyRE: A Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment
Language: Jupyter Notebook - Size: 46.6 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 7 - Forks: 3

WellyZhang/RAVEN
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Language: Python - Size: 102 KB - Last synced at: over 1 year ago - Pushed at: over 4 years ago - Stars: 135 - Forks: 26

wentaoheunnc/HCV-ARR
[AAAI 2023] Hierarchical ConViT with Attention-based Relational Reasoner for Visual Analogical Reasoning
Language: Python - Size: 5.58 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 1 - Forks: 0

ethanjperez/film Fork of facebookresearch/clevr-iep
FiLM: Visual Reasoning with a General Conditioning Layer
Language: Python - Size: 2.48 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 281 - Forks: 55

aligholami/hexia
Mid-level PyTorch Based Framework for Visual Question Answering.
Language: Python - Size: 20 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 24 - Forks: 2

catalina17/VideoNavQA
An alternative EQA paradigm and informative benchmark + models (BMVC 2019, ViGIL 2019 spotlight)
Language: Python - Size: 5.17 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 23 - Forks: 1

cobanov/image-captioning
Image captioning using python and BLIP
Language: Python - Size: 28.2 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 10 - Forks: 3

marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
Size: 20.5 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

rs9000/VisualReasoning_MMnet
Visual reasoning modular memory network
Language: Python - Size: 9.33 MB - Last synced at: about 2 years ago - Pushed at: over 6 years ago - Stars: 2 - Forks: 0

alexmirrington/gat-vqa
Source code for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"
Language: Python - Size: 474 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

Sina-Baharlou/VisualGenome-to-Depth
Convert RGB images of Visual-Genome dataset to Depth Maps.
Language: Python - Size: 236 KB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 3 - Forks: 0

markvasin/openvqa Fork of MILVLG/openvqa
Implementation of the VQA model from my MSc project
Language: Python - Size: 1.97 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 1 - Forks: 1

WellyZhang/ALANS
Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning
Language: Python - Size: 2.59 MB - Last synced at: over 2 years ago - Pushed at: almost 3 years ago - Stars: 8 - Forks: 1

WellyZhang/ACRE
ACRE: Abstract Causal REasoning Beyond Covariation
Language: Python - Size: 2.52 MB - Last synced at: over 2 years ago - Pushed at: over 3 years ago - Stars: 14 - Forks: 1

WellyZhang/PrAE
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution
Language: Python - Size: 44.9 KB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 13 - Forks: 2

WellyZhang/CoPINet
Learning Perceptual Inference by Contrasting
Language: Python - Size: 23.4 KB - Last synced at: over 2 years ago - Pushed at: about 3 years ago - Stars: 25 - Forks: 3

markvasin/MSc-Project
Multimodal Learning and Reasoning for Visual Question Answering
Language: TeX - Size: 42.1 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 2 - Forks: 0

alexmirrington/honours-thesis
LaTeX files for my honours thesis: "Graph Attention Networks for Compositional Visual Question Answering"
Language: TeX - Size: 14.1 MB - Last synced at: over 2 years ago - Pushed at: almost 4 years ago - Stars: 2 - Forks: 1

jaehyunnn/RelationalNetwork_pytorch
An un-official implementation of Relational Network [A. Santoro et al., 2017] (PyTorch)
Language: Python - Size: 7.81 KB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 2 - Forks: 0

markvasin/nscl_reproducability_challenge Fork of COMP6248-Reproducability-Challenge/nscl_reproducability_challenge
Reproducibility Challenge - The Neuro-Symbolic Concept Learner
Language: Jupyter Notebook - Size: 15.1 MB - Last synced at: over 2 years ago - Pushed at: over 4 years ago - Stars: 0 - Forks: 0
