3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Zhao, Lichen; Cai, Daigang; Sheng, Lu; Xu, Dong

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/ICCV48922.2021.00292
Scopus: eid_2-s2.0-85120944689
WOS: WOS:000797698903012
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Title	3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
Authors	Zhao, Lichen Cai, Daigang Sheng, Lu Xu, Dong
Issue Date	2021
Citation	Proceedings of the IEEE International Conference on Computer Vision, 2021, p. 2908-2917 How to Cite? DOI: http://dx.doi.org/10.1109/ICCV48922.2021.00292
Abstract	Visual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world. By formulating this task as a grounding-by-detection problem, lots of recent works focus on how to exploit more powerful detectors and comprehensive language features, but (1) how to model complex relations for generating context-aware object proposals and (2) how to leverage proposal relations to distinguish the true target object from similar proposals are not fully studied yet. Inspired by the well-known transformer architecture, we propose a relation-aware visual grounding method on 3D point clouds, named as 3DVG-Transformer, to fully utilize the contextual clues for relation-enhanced proposal generation and cross-modal proposal disambiguation, which are enabled by a newly designed coordinate-guided contextual aggregation (CCA) module in the object proposal generation stage, and a multiplex attention (MA) module in the cross-modal feature fusion stage. We validate that our 3DVG-Transformer outperforms the state-of-the-art methods by a large margin, on two point cloud-based visual grounding datasets, ScanRefer and Nr3D/Sr3D from ReferIt3D, especially for complex scenarios containing multiple objects of the same category.
Persistent Identifier	http://hdl.handle.net/10722/321974
ISSN	1550-5499 2023 SCImago Journal Rankings: 12.263
ISI Accession Number ID	WOS:000797698903012

DC Field	Value	Language
dc.contributor.author	Zhao, Lichen	-
dc.contributor.author	Cai, Daigang	-
dc.contributor.author	Sheng, Lu	-
dc.contributor.author	Xu, Dong	-
dc.date.accessioned	2022-11-03T02:22:44Z	-
dc.date.available	2022-11-03T02:22:44Z	-
dc.date.issued	2021	-
dc.identifier.citation	Proceedings of the IEEE International Conference on Computer Vision, 2021, p. 2908-2917	-
dc.identifier.issn	1550-5499	-
dc.identifier.uri	http://hdl.handle.net/10722/321974	-
dc.description.abstract	Visual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world. By formulating this task as a grounding-by-detection problem, lots of recent works focus on how to exploit more powerful detectors and comprehensive language features, but (1) how to model complex relations for generating context-aware object proposals and (2) how to leverage proposal relations to distinguish the true target object from similar proposals are not fully studied yet. Inspired by the well-known transformer architecture, we propose a relation-aware visual grounding method on 3D point clouds, named as 3DVG-Transformer, to fully utilize the contextual clues for relation-enhanced proposal generation and cross-modal proposal disambiguation, which are enabled by a newly designed coordinate-guided contextual aggregation (CCA) module in the object proposal generation stage, and a multiplex attention (MA) module in the cross-modal feature fusion stage. We validate that our 3DVG-Transformer outperforms the state-of-the-art methods by a large margin, on two point cloud-based visual grounding datasets, ScanRefer and Nr3D/Sr3D from ReferIt3D, especially for complex scenarios containing multiple objects of the same category.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE International Conference on Computer Vision	-
dc.title	3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/ICCV48922.2021.00292	-
dc.identifier.scopus	eid_2-s2.0-85120944689	-
dc.identifier.spage	2908	-
dc.identifier.epage	2917	-
dc.identifier.isi	WOS:000797698903012	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats