File Download
There are no files associated with this item.
Supplementary
-
Citations:
- Appears in Collections:
Conference Paper: Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Title | Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding |
---|---|
Authors | |
Issue Date | 17-Jun-2024 |
Abstract | 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG. Code is available at https://curryyuan.github.io/Z5VG3D. |
Persistent Identifier | http://hdl.handle.net/10722/350519 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Yuan, Zhihao | - |
dc.contributor.author | Ren, Jinke | - |
dc.contributor.author | Feng, Chun-Mei | - |
dc.contributor.author | Zhao, Hengshuang | - |
dc.contributor.author | Cui, Shuguang | - |
dc.contributor.author | Li, Zhen | - |
dc.date.accessioned | 2024-10-29T00:32:02Z | - |
dc.date.available | 2024-10-29T00:32:02Z | - |
dc.date.issued | 2024-06-17 | - |
dc.identifier.uri | http://hdl.handle.net/10722/350519 | - |
dc.description.abstract | <p>3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG. Code is available at https://curryyuan.github.io/Z5VG3D.</p> | - |
dc.language | eng | - |
dc.relation.ispartof | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (17/06/2024-21/06/2024, Seattle) | - |
dc.title | Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding | - |
dc.type | Conference_Paper | - |
dc.identifier.doi | 10.1109/CVPR52733.2024.01949 | - |