Transformer3D-Det: Improving 3D Object Detection by Vote Refinement

Zhao, Lichen; Guo, Jinyang; Xu, Dong; Sheng, Lu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TCSVT.2021.3102025
Scopus: eid_2-s2.0-85112624645
WOS: WOS:000725812500018
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Transformer3D-Det: Improving 3D Object Detection by Vote Refinement

Title	Transformer3D-Det: Improving 3D Object Detection by Vote Refinement
Authors	Zhao, Lichen Guo, Jinyang Xu, Dong Sheng, Lu
Keywords	3D object detection neural network Point cloud transformer
Issue Date	2021
Citation	IEEE Transactions on Circuits and Systems for Video Technology, 2021, v. 31, n. 12, p. 4735-4746 How to Cite? DOI: http://dx.doi.org/10.1109/TCSVT.2021.3102025
Abstract	Voting-based methods (e.g., VoteNet) have achieved promising results for 3D object detection. However, the simple voting operation in VoteNet may lead to less accurate voting results that are far away from the true object centers. In this work, we propose a simple but effective 3D object detection method called Transformer3D-Det (T3D), in which we additionally introduce a transformer based vote refinement module to refine the voting results of VoteNet and can thus significantly improve the 3D object detection performance. Specifically, our T3D framework consists of three modules: a vote generation module, a vote refinement module, and a bounding box generation module. Given an input point cloud, we first utilize the vote generation module to generate multiple coarse vote clusters. Then, the clustered coarse votes will be refined by using our transformer based vote refinement module to produce more accurate and meaningful votes. Finally, the bounding box generation module takes the refined vote clusters as the input and generates the final detection result for the input point cloud. To alleviate the impact of inaccurate votes, we also propose a new non-vote loss function to train our T3D. As a result, our T3D framework can achieve better 3D object detection performance. Comprehensive experiments on two benchmark datasets ScanNetV2 and SUN RGB-D demonstrate the effectiveness of our T3D framework for 3D object detection.
Persistent Identifier	http://hdl.handle.net/10722/321957
ISSN	1051-8215 2021 Impact Factor: 5.859 2020 SCImago Journal Rankings: 0.873
ISI Accession Number ID	WOS:000725812500018

DC Field	Value	Language
dc.contributor.author	Zhao, Lichen	-
dc.contributor.author	Guo, Jinyang	-
dc.contributor.author	Xu, Dong	-
dc.contributor.author	Sheng, Lu	-
dc.date.accessioned	2022-11-03T02:22:37Z	-
dc.date.available	2022-11-03T02:22:37Z	-
dc.date.issued	2021	-
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2021, v. 31, n. 12, p. 4735-4746	-
dc.identifier.issn	1051-8215	-
dc.identifier.uri	http://hdl.handle.net/10722/321957	-
dc.description.abstract	Voting-based methods (e.g., VoteNet) have achieved promising results for 3D object detection. However, the simple voting operation in VoteNet may lead to less accurate voting results that are far away from the true object centers. In this work, we propose a simple but effective 3D object detection method called Transformer3D-Det (T3D), in which we additionally introduce a transformer based vote refinement module to refine the voting results of VoteNet and can thus significantly improve the 3D object detection performance. Specifically, our T3D framework consists of three modules: a vote generation module, a vote refinement module, and a bounding box generation module. Given an input point cloud, we first utilize the vote generation module to generate multiple coarse vote clusters. Then, the clustered coarse votes will be refined by using our transformer based vote refinement module to produce more accurate and meaningful votes. Finally, the bounding box generation module takes the refined vote clusters as the input and generates the final detection result for the input point cloud. To alleviate the impact of inaccurate votes, we also propose a new non-vote loss function to train our T3D. As a result, our T3D framework can achieve better 3D object detection performance. Comprehensive experiments on two benchmark datasets ScanNetV2 and SUN RGB-D demonstrate the effectiveness of our T3D framework for 3D object detection.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology	-
dc.subject	3D object detection	-
dc.subject	neural network	-
dc.subject	Point cloud	-
dc.subject	transformer	-
dc.title	Transformer3D-Det: Improving 3D Object Detection by Vote Refinement	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TCSVT.2021.3102025	-
dc.identifier.scopus	eid_2-s2.0-85112624645	-
dc.identifier.volume	31	-
dc.identifier.issue	12	-
dc.identifier.spage	4735	-
dc.identifier.epage	4746	-
dc.identifier.eissn	1558-2205	-
dc.identifier.isi	WOS:000725812500018	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Transformer3D-Det: Improving 3D Object Detection by Vote Refinement

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats