File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/CVPR52733.2024.01949
- WOS: WOS:001342515503092
Supplementary
-
Citations:
- Web of Science: 0
- Appears in Collections:
Conference Paper: Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
| Title | Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding |
|---|---|
| Authors | |
| Issue Date | 17-Jun-2024 |
| Abstract | 3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG. Code is available at https://curryyuan.github.io/Z5VG3D. |
| Persistent Identifier | http://hdl.handle.net/10722/350519 |
| ISI Accession Number ID |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Yuan, Zhihao | - |
| dc.contributor.author | Ren, Jinke | - |
| dc.contributor.author | Feng, Chun-Mei | - |
| dc.contributor.author | Zhao, Hengshuang | - |
| dc.contributor.author | Cui, Shuguang | - |
| dc.contributor.author | Li, Zhen | - |
| dc.date.accessioned | 2024-10-29T00:32:02Z | - |
| dc.date.available | 2024-10-29T00:32:02Z | - |
| dc.date.issued | 2024-06-17 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/350519 | - |
| dc.description.abstract | <p>3D Visual Grounding (3DVG) aims at localizing 3D object based on textual descriptions. Conventional supervised methods for 3DVG often necessitate extensive annotations and a predefined vocabulary, which can be restrictive. To address this issue, we propose a novel visual programming approach for zero-shot open-vocabulary 3DVG, leveraging the capabilities of large language models (LLMs). Our approach begins with a unique dialog-based method, engaging with LLMs to establish a foundational understanding of zero-shot 3DVG. Building on this, we design a visual program that consists of three types of modules, i.e., view-independent, view-dependent, and functional modules. These modules, specifically tailored for 3D scenarios, work collaboratively to perform complex reasoning and inference. Furthermore, we develop an innovative language-object correlation module to extend the scope of existing 3D object detectors into open-vocabulary scenarios. Extensive experiments demonstrate that our zero-shot approach can outperform some supervised baselines, marking a significant stride towards effective 3DVG. Code is available at https://curryyuan.github.io/Z5VG3D.</p> | - |
| dc.language | eng | - |
| dc.relation.ispartof | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (17/06/2024-21/06/2024, Seattle) | - |
| dc.title | Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding | - |
| dc.type | Conference_Paper | - |
| dc.identifier.doi | 10.1109/CVPR52733.2024.01949 | - |
| dc.identifier.isi | WOS:001342515503092 | - |
