Topic: "visual-grounding"
TheShadow29/awesome-grounding
awesome grounding: A curated list of research papers in visual grounding
Size: 172 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 1,069 - Forks: 100

Charles-Xie/awesome-described-object-detection
A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.
Size: 40 KB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 265 - Forks: 22

rhett-chen/Robotic-grasping-papers
paper list of robotic grasping and some related works
Size: 210 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 251 - Forks: 15

daveredrum/ScanRefer
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
Language: Python - Size: 36.4 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 211 - Forks: 28

LeapLabTHU/Pseudo-Q
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Language: Python - Size: 22.9 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 148 - Forks: 10

linhuixiao/Awesome-Visual-Grounding
[TPAMI reviewing] Towards Visual Grounding: A Survey
Language: Shell - Size: 2.87 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 136 - Forks: 16

seanzhuh/SeqTR
SeqTR: A Simple yet Universal Network for Visual Grounding
Language: Python - Size: 3.46 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 130 - Forks: 14

antoyang/TubeDETR
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
Language: Python - Size: 93.8 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 127 - Forks: 8

jianghaojun/Awesome-3D-Vision-and-Language
A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.
Size: 33.2 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 97 - Forks: 5

yangli18/VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
Language: Python - Size: 603 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 91 - Forks: 8

ChenyunWu/PhraseCutDataset
Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"
Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 91 - Forks: 10

JerryX1110/awesome-rvos
Referring Video Object Segmentation / Multi-Object Tracking Repo
Language: Python - Size: 79.1 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 87 - Forks: 4

3dlg-hcvc/M3DRef-CLIP
[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects
Language: Python - Size: 1.52 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 82 - Forks: 4

yanmin-wu/EDA
[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Language: Python - Size: 2.35 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 2

MultimodalGeo/GeoText-1652
An offical repo for ECCV 2024 Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Language: Python - Size: 41 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 67 - Forks: 2

TheShadow29/vognet-pytorch
[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)
Language: Python - Size: 3.45 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 67 - Forks: 7

doc-doc/vRGV
Visual Relation Grounding in Videos (ECCV'20, Spotlight)
Language: Python - Size: 78.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 7

svip-lab/LBYLNet
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.
Language: Python - Size: 12.8 MB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 47 - Forks: 10

chihyaoma/cyclical-visual-captioning
PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision
Language: Python - Size: 923 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 41 - Forks: 3

zlccccc/3DVL_Codebase
[CVPR2022 Oral] 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds
Language: Python - Size: 69.4 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 40 - Forks: 4

daveredrum/D3Net
[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Language: Python - Size: 105 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 5

zjukg/DUET
[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning
Language: Python - Size: 7.63 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 8

CurryYuan/ZSVG3D
[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Language: Jupyter Notebook - Size: 21.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 29 - Forks: 1

zlccccc/3DVG-Transformer
[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
Language: Python - Size: 15 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 27 - Forks: 4

XJay18/NeuMA
[NeurIPS 2024] NeuMA: Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics
Language: Python - Size: 1.13 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 22 - Forks: 3

RuoyuChen10/VPS
[CVPR 2025 Highlight] Interpreting Object-level Foundation Models via Visual Precision Search
Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 16 - Forks: 0

xuyang-liu16/VGDiffZero
[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
Language: Python - Size: 1.07 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 14 - Forks: 1

CurryYuan/PhraseRefer
Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases
Language: JavaScript - Size: 24.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 0

1989Ryan/paragon
Code for ICRA paper: Differentiable parsing and visual grounding of human language instructions for object placement
Language: Python - Size: 6.96 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning
A list of research papers on knowledge-enhanced multimodal learning
Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

CompGuessWhat/comp_probing
Code used to train probing classifiers in the attribute prediction task
Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

gorjanradevski/text2atlas
Codebase for "Learning to ground medical text in a 3D human atlas (CoNLL 2020)".
Language: Python - Size: 9.16 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 1

JHKim-snu/PGA
[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Language: Python - Size: 34.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

akskuchi/groovist
GROOViST: A Metric for Grounding Objects in Visual Storytelling – EMNLP 2023
Language: Python - Size: 10 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

lparolari/harlequin
Code and DataLoader for the Harlequin dataset 🎨 described in the paper "Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension", presented at ICPR'24
Language: Python - Size: 3.42 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

scofield7419/MUIE
MUIE: Multimodal Universal Information Extraction
Language: JavaScript - Size: 8.83 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

bwittmann/TransformerRefer
Utilizing a transformer-based object detector for the task of 3D visual grounding.
Language: Python - Size: 183 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

tony10101105/Vigor
[WACV'25] Data-Efficient 3D Visual Grounding via Order-Aware Referring
Language: C++ - Size: 38.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

timbmg/belief
Implementation of Master Thesis on "Belief State for Visually Grounded, Task-Oriented Neural Dialogue Model"
Language: Jupyter Notebook - Size: 24.1 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

3dlg-hcvc/ENet-ScanNet
Helper tools for extracting and projecting ENet features to ScanNet pointclouds.
Language: Python - Size: 21.5 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

izhx/Phrase-Grounding-with-Pronoun
[EMNLP 22] Extending Phrase Grounding with Pronouns in Visual Dialogues.
Language: Python - Size: 3.26 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ZhenZHAO/Papers-VisualGrounding
Explore new research topics, visual grounding
Size: 152 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ChenBarryHu/TransformerVG
TransformerVG - 3D Visual Grounding with Transformers
Language: Python - Size: 58.9 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

antonio-f/Florence-2-test
Florence-2 quick test
Language: Jupyter Notebook - Size: 3.91 MB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

rorosonoio/Visual-Grounding
Shortened version of the final exam for the Deep Learning course of the University of Trento in 2023.
Language: Jupyter Notebook - Size: 507 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

TarasRashkevych99/visual-grounding
This is a deep learning project focused on the visual grounding task
Language: Python - Size: 148 KB - Last synced at: 26 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

YueChenGithub/visual-grounding
HAIS_2GNN: 3D Visual Grounding with Graph and Attention
Language: Python - Size: 13.5 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

lparolari/master-thesis
Dissertation for "Weakly Supervised Visual-Textual Grounding based on Concept Similarity" (MS thesis at University of Padua, Italy) - PyTorch implementation: https://github.com/lparolari/weakvtg
Language: TeX - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lparolari/weakvtg
PyTorch implementation of the model described my MS thesis: "Weakly Supervised Visual-Textual Grounding based on Concept Similarity" (https://github.com/lparolari/master-thesis)
Language: Python - Size: 1.97 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

lparolari/master-thesis-log
A collection of resources (work logs, state-of-the-art scores, experiment trace, scripts and proof-of-concepts) for my MS thesis "Weakly Supervised Visual-Textual Grounding based on Concept Similarity" - https://github.com/lparolari/weakvtg
Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lparolari/master-thesis-report
A quasi-final short and summary report on my thesis "Weakly Supervised Visual-Textual Grounding based on Concept Similarity". (MS thesis at University of Padua, Italy). - https://github.com/lparolari/weakvtg
Language: TeX - Size: 289 KB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0
