File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

Title3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds
Authors
Issue Date2021
Citation
Proceedings of the IEEE International Conference on Computer Vision, 2021, p. 2908-2917 How to Cite?
AbstractVisual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world. By formulating this task as a grounding-by-detection problem, lots of recent works focus on how to exploit more powerful detectors and comprehensive language features, but (1) how to model complex relations for generating context-aware object proposals and (2) how to leverage proposal relations to distinguish the true target object from similar proposals are not fully studied yet. Inspired by the well-known transformer architecture, we propose a relation-aware visual grounding method on 3D point clouds, named as 3DVG-Transformer, to fully utilize the contextual clues for relation-enhanced proposal generation and cross-modal proposal disambiguation, which are enabled by a newly designed coordinate-guided contextual aggregation (CCA) module in the object proposal generation stage, and a multiplex attention (MA) module in the cross-modal feature fusion stage. We validate that our 3DVG-Transformer outperforms the state-of-the-art methods by a large margin, on two point cloud-based visual grounding datasets, ScanRefer and Nr3D/Sr3D from ReferIt3D, especially for complex scenarios containing multiple objects of the same category.
Persistent Identifierhttp://hdl.handle.net/10722/321974
ISSN
2023 SCImago Journal Rankings: 12.263
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorZhao, Lichen-
dc.contributor.authorCai, Daigang-
dc.contributor.authorSheng, Lu-
dc.contributor.authorXu, Dong-
dc.date.accessioned2022-11-03T02:22:44Z-
dc.date.available2022-11-03T02:22:44Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the IEEE International Conference on Computer Vision, 2021, p. 2908-2917-
dc.identifier.issn1550-5499-
dc.identifier.urihttp://hdl.handle.net/10722/321974-
dc.description.abstractVisual grounding on 3D point clouds is an emerging vision and language task that benefits various applications in understanding the 3D visual world. By formulating this task as a grounding-by-detection problem, lots of recent works focus on how to exploit more powerful detectors and comprehensive language features, but (1) how to model complex relations for generating context-aware object proposals and (2) how to leverage proposal relations to distinguish the true target object from similar proposals are not fully studied yet. Inspired by the well-known transformer architecture, we propose a relation-aware visual grounding method on 3D point clouds, named as 3DVG-Transformer, to fully utilize the contextual clues for relation-enhanced proposal generation and cross-modal proposal disambiguation, which are enabled by a newly designed coordinate-guided contextual aggregation (CCA) module in the object proposal generation stage, and a multiplex attention (MA) module in the cross-modal feature fusion stage. We validate that our 3DVG-Transformer outperforms the state-of-the-art methods by a large margin, on two point cloud-based visual grounding datasets, ScanRefer and Nr3D/Sr3D from ReferIt3D, especially for complex scenarios containing multiple objects of the same category.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE International Conference on Computer Vision-
dc.title3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/ICCV48922.2021.00292-
dc.identifier.scopuseid_2-s2.0-85120944689-
dc.identifier.spage2908-
dc.identifier.epage2917-
dc.identifier.isiWOS:000797698903012-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats