An open API service providing repository metadata for many open source software ecosystems.

Topic: "visual-grounding"

TheShadow29/awesome-grounding

awesome grounding: A curated list of research papers in visual grounding

Size: 172 KB - Last synced at: 2 days ago - Pushed at: about 2 years ago - Stars: 1,069 - Forks: 100

Charles-Xie/awesome-described-object-detection

A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring Expression Comprehension. Updated frequently and pull requests welcomed.

Size: 40 KB - Last synced at: 10 days ago - Pushed at: 13 days ago - Stars: 265 - Forks: 22

rhett-chen/Robotic-grasping-papers

paper list of robotic grasping and some related works

Size: 210 KB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 251 - Forks: 15

daveredrum/ScanRefer

[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Language: Python - Size: 36.4 MB - Last synced at: 12 months ago - Pushed at: about 2 years ago - Stars: 211 - Forks: 28

LeapLabTHU/Pseudo-Q

[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Language: Python - Size: 22.9 MB - Last synced at: 18 days ago - Pushed at: 9 months ago - Stars: 148 - Forks: 10

linhuixiao/Awesome-Visual-Grounding

[TPAMI reviewing] Towards Visual Grounding: A Survey

Language: Shell - Size: 2.87 MB - Last synced at: 9 days ago - Pushed at: about 1 month ago - Stars: 136 - Forks: 16

seanzhuh/SeqTR

SeqTR: A Simple yet Universal Network for Visual Grounding

Language: Python - Size: 3.46 MB - Last synced at: 6 months ago - Pushed at: 6 months ago - Stars: 130 - Forks: 14

antoyang/TubeDETR

[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers

Language: Python - Size: 93.8 KB - Last synced at: over 1 year ago - Pushed at: about 2 years ago - Stars: 127 - Forks: 8

jianghaojun/Awesome-3D-Vision-and-Language

A collection of 3D vision and language (e.g., 3D Visual Grounding, 3D Question Answering and 3D Dense Caption) papers and datasets.

Size: 33.2 KB - Last synced at: 9 days ago - Pushed at: about 2 years ago - Stars: 97 - Forks: 5

yangli18/VLTVG

Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022

Language: Python - Size: 603 KB - Last synced at: 5 months ago - Pushed at: over 2 years ago - Stars: 91 - Forks: 8

ChenyunWu/PhraseCutDataset

Dataset API for "PhraseCut: Language-based Image Segmentation in the Wild"

Language: Jupyter Notebook - Size: 15.5 MB - Last synced at: about 1 year ago - Pushed at: almost 5 years ago - Stars: 91 - Forks: 10

JerryX1110/awesome-rvos

Referring Video Object Segmentation / Multi-Object Tracking Repo

Language: Python - Size: 79.1 KB - Last synced at: 6 days ago - Pushed at: over 1 year ago - Stars: 87 - Forks: 4

3dlg-hcvc/M3DRef-CLIP

[ICCV 2023] Multi3DRefer: Grounding Text Description to Multiple 3D Objects

Language: Python - Size: 1.52 MB - Last synced at: 16 days ago - Pushed at: about 1 year ago - Stars: 82 - Forks: 4

yanmin-wu/EDA

[CVPR 2023] EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

Language: Python - Size: 2.35 MB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 76 - Forks: 2

MultimodalGeo/GeoText-1652

An offical repo for ECCV 2024 Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching

Language: Python - Size: 41 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 67 - Forks: 2

TheShadow29/vognet-pytorch

[CVPR20] Video Object Grounding using Semantic Roles in Language Description (https://arxiv.org/abs/2003.10606)

Language: Python - Size: 3.45 MB - Last synced at: about 2 years ago - Pushed at: almost 5 years ago - Stars: 67 - Forks: 7

doc-doc/vRGV

Visual Relation Grounding in Videos (ECCV'20, Spotlight)

Language: Python - Size: 78.9 MB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 55 - Forks: 7

svip-lab/LBYLNet

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

Language: Python - Size: 12.8 MB - Last synced at: 20 days ago - Pushed at: over 3 years ago - Stars: 47 - Forks: 10

chihyaoma/cyclical-visual-captioning

PyTorch code for: Learning to Generate Grounded Visual Captions without Localization Supervision

Language: Python - Size: 923 KB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 41 - Forks: 3

zlccccc/3DVL_Codebase

[CVPR2022 Oral] 3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

Language: Python - Size: 69.4 MB - Last synced at: almost 2 years ago - Pushed at: about 2 years ago - Stars: 40 - Forks: 4

daveredrum/D3Net

[ECCV2022] D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding

Language: Python - Size: 105 MB - Last synced at: about 1 year ago - Pushed at: over 2 years ago - Stars: 36 - Forks: 5

zjukg/DUET

[Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Language: Python - Size: 7.63 MB - Last synced at: about 1 year ago - Pushed at: about 1 year ago - Stars: 35 - Forks: 8

CurryYuan/ZSVG3D

[CVPR 2024] Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Language: Jupyter Notebook - Size: 21.5 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 29 - Forks: 1

zlccccc/3DVG-Transformer

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Language: Python - Size: 15 MB - Last synced at: almost 2 years ago - Pushed at: almost 3 years ago - Stars: 27 - Forks: 4

XJay18/NeuMA

[NeurIPS 2024] NeuMA: Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics

Language: Python - Size: 1.13 MB - Last synced at: 5 months ago - Pushed at: 5 months ago - Stars: 22 - Forks: 3

RuoyuChen10/VPS

[CVPR 2025 Highlight] Interpreting Object-level Foundation Models via Visual Precision Search

Language: Jupyter Notebook - Size: 17.4 MB - Last synced at: 17 days ago - Pushed at: 17 days ago - Stars: 16 - Forks: 0

xuyang-liu16/VGDiffZero

[ICASSP 2024] VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

Language: Python - Size: 1.07 MB - Last synced at: 2 months ago - Pushed at: 2 months ago - Stars: 14 - Forks: 1

CurryYuan/PhraseRefer

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

Language: JavaScript - Size: 24.6 MB - Last synced at: almost 2 years ago - Pushed at: almost 2 years ago - Stars: 7 - Forks: 0

1989Ryan/paragon

Code for ICRA paper: Differentiable parsing and visual grounding of human language instructions for object placement

Language: Python - Size: 6.96 MB - Last synced at: about 2 years ago - Pushed at: about 2 years ago - Stars: 7 - Forks: 0

marialymperaiou/knowledge-enhanced-multimodal-learning

A list of research papers on knowledge-enhanced multimodal learning

Size: 20.5 KB - Last synced at: almost 2 years ago - Pushed at: over 2 years ago - Stars: 5 - Forks: 0

CompGuessWhat/comp_probing

Code used to train probing classifiers in the attribute prediction task

Language: Python - Size: 16.6 KB - Last synced at: about 2 years ago - Pushed at: almost 3 years ago - Stars: 5 - Forks: 0

gorjanradevski/text2atlas

Codebase for "Learning to ground medical text in a 3D human atlas (CoNLL 2020)".

Language: Python - Size: 9.16 MB - Last synced at: about 2 years ago - Pushed at: over 4 years ago - Stars: 5 - Forks: 1

JHKim-snu/PGA

[IROS 2024] PGA: Personalizing Grasping Agents with Single Human-Robot Interaction

Language: Python - Size: 34.8 MB - Last synced at: 9 months ago - Pushed at: 9 months ago - Stars: 4 - Forks: 0

akskuchi/groovist

GROOViST: A Metric for Grounding Objects in Visual Storytelling – EMNLP 2023

Language: Python - Size: 10 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 2 - Forks: 1

lparolari/harlequin

Code and DataLoader for the Harlequin dataset 🎨 described in the paper "Harlequin: Color-driven Generation of Synthetic Data for Referring Expression Comprehension", presented at ICPR'24

Language: Python - Size: 3.42 MB - Last synced at: 2 days ago - Pushed at: 5 months ago - Stars: 2 - Forks: 0

scofield7419/MUIE

MUIE: Multimodal Universal Information Extraction

Language: JavaScript - Size: 8.83 MB - Last synced at: about 2 months ago - Pushed at: 8 months ago - Stars: 2 - Forks: 0

bwittmann/TransformerRefer

Utilizing a transformer-based object detector for the task of 3D visual grounding.

Language: Python - Size: 183 MB - Last synced at: over 1 year ago - Pushed at: over 3 years ago - Stars: 2 - Forks: 0

tony10101105/Vigor

[WACV'25] Data-Efficient 3D Visual Grounding via Order-Aware Referring

Language: C++ - Size: 38.2 MB - Last synced at: 3 months ago - Pushed at: 3 months ago - Stars: 1 - Forks: 0

timbmg/belief

Implementation of Master Thesis on "Belief State for Visually Grounded, Task-Oriented Neural Dialogue Model"

Language: Jupyter Notebook - Size: 24.1 MB - Last synced at: 17 days ago - Pushed at: 9 months ago - Stars: 1 - Forks: 0

3dlg-hcvc/ENet-ScanNet

Helper tools for extracting and projecting ENet features to ScanNet pointclouds.

Language: Python - Size: 21.5 KB - Last synced at: 16 days ago - Pushed at: almost 2 years ago - Stars: 1 - Forks: 0

izhx/Phrase-Grounding-with-Pronoun

[EMNLP 22] Extending Phrase Grounding with Pronouns in Visual Dialogues.

Language: Python - Size: 3.26 MB - Last synced at: 20 days ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ZhenZHAO/Papers-VisualGrounding

Explore new research topics, visual grounding

Size: 152 KB - Last synced at: about 2 years ago - Pushed at: over 2 years ago - Stars: 1 - Forks: 0

ChenBarryHu/TransformerVG

TransformerVG - 3D Visual Grounding with Transformers

Language: Python - Size: 58.9 MB - Last synced at: over 1 year ago - Pushed at: about 3 years ago - Stars: 1 - Forks: 0

antonio-f/Florence-2-test

Florence-2 quick test

Language: Jupyter Notebook - Size: 3.91 MB - Last synced at: 22 days ago - Pushed at: 8 months ago - Stars: 0 - Forks: 0

rorosonoio/Visual-Grounding

Shortened version of the final exam for the Deep Learning course of the University of Trento in 2023.

Language: Jupyter Notebook - Size: 507 KB - Last synced at: over 1 year ago - Pushed at: over 1 year ago - Stars: 0 - Forks: 0

TarasRashkevych99/visual-grounding

This is a deep learning project focused on the visual grounding task

Language: Python - Size: 148 KB - Last synced at: 26 days ago - Pushed at: almost 2 years ago - Stars: 0 - Forks: 0

YueChenGithub/visual-grounding

HAIS_2GNN: 3D Visual Grounding with Graph and Attention

Language: Python - Size: 13.5 MB - Last synced at: about 1 year ago - Pushed at: about 3 years ago - Stars: 0 - Forks: 0

lparolari/master-thesis

Dissertation for "Weakly Supervised Visual-Textual Grounding based on Concept Similarity" (MS thesis at University of Padua, Italy) - PyTorch implementation: https://github.com/lparolari/weakvtg

Language: TeX - Size: 22.9 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lparolari/weakvtg

PyTorch implementation of the model described my MS thesis: "Weakly Supervised Visual-Textual Grounding based on Concept Similarity" (https://github.com/lparolari/master-thesis)

Language: Python - Size: 1.97 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 1

lparolari/master-thesis-log

A collection of resources (work logs, state-of-the-art scores, experiment trace, scripts and proof-of-concepts) for my MS thesis "Weakly Supervised Visual-Textual Grounding based on Concept Similarity" - https://github.com/lparolari/weakvtg

Language: Jupyter Notebook - Size: 16.5 MB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

lparolari/master-thesis-report

A quasi-final short and summary report on my thesis "Weakly Supervised Visual-Textual Grounding based on Concept Similarity". (MS thesis at University of Padua, Italy). - https://github.com/lparolari/weakvtg

Language: TeX - Size: 289 KB - Last synced at: 2 days ago - Pushed at: over 3 years ago - Stars: 0 - Forks: 0

Related Topics
computer-vision 14 deep-learning 13 vision-and-language 10 pytorch 9 3d 7 nlp 6 grounding 5 phrase-grounding 5 point-cloud 5 natural-language-processing 5 multimodal-deep-learning 5 referring-expression-comprehension 4 transformer 4 3d-vision-and-language 4 scanrefer 3 awesome 3 captioning-videos 3 semi-supervised-learning 2 robotic-manipulation 2 visual-storytelling 2 neural-networks 2 eccv 2 captioning-images 2 machine-learning 2 clip 2 video-understanding 2 awesome-list 2 papers 2 zero-shot-learning 2 robotics 2 cross-modal 2 3d-vision 2 3d-visual-grounding 2 dialogue 2 cvpr2022 2 visual-dialog 2 pytorch-implementation 2 video 2 video-grounding 2 multimodal-information-extraction 1 knowledge-enhanced-multimodal-learning 1 knowledge-enhanced-vision-language 1 universal-information-extraction 1 cuda 1 explainable-ai 1 information-extraction 1 vision-foundation-model 1 tutorial 1 knowledge-graph 1 python 1 multi-task-learning 1 multimodal-large-language-models 1 multimodal-retrieval 1 jupyter-notebook 1 image-to-text 1 story-visualization 1 image-captioning 1 huggingface-transformers 1 vision-and-language-navigation 1 florence-2 1 colab-notebook 1 vision-and-language-pre-training 1 vision-language-transformer 1 visual-reasoning 1 visual-question-answering 1 visual-commonsense-reasoning 1 localization 1 survey 1 image 1 linguistic 1 multi-modal 1 refer-segmentation 1 refer-vos 1 refering-seg 1 rvos 1 segmentation 1 text 1 youtube-vos 1 arxiv 1 embodied-agent 1 image-grounding 1 language-grounding 1 paper 1 paper-roadmap 1 pytorch-lightning 1 natural-language-generation 1 synthetic-data-generation 1 vision-language 1 visual-linguistic 1 open-vocabulary-detection 1 open-world-object-detection 1 dialog 1 dialogue-systems 1 neural-network 1 nlg 1 nlproc 1 stable-diffusion 1 text-to-image-generation 1 vision-language-model 1 differentiable-physics 1