Stratified Transformer for 3D Point Cloud Segmentation

Lai, Xin; Liu, Jianhui; Jiang, Li; Wang, Liwei; Zhao, Hengshuang; Liu, Shu; Qi, Xiaojuan; Jia, Jiaya

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/CVPR52688.2022.00831
Scopus: eid_2-s2.0-85132707307
WOS: WOS:000870759101053
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Stratified Transformer for 3D Point Cloud Segmentation

Title	Stratified Transformer for 3D Point Cloud Segmentation
Authors	Lai, Xin Liu, Jianhui Jiang, Li Wang, Liwei Zhao, Hengshuang Liu, Shu Qi, Xiaojuan Jia, Jiaya
Keywords	3D from multi-view and sensors grouping and shape analysis Scene analysis and understanding Segmentation
Issue Date	2022
Citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 8490-8499 How to Cite? DOI: http://dx.doi.org/10.1109/CVPR52688.2022.00831
Abstract	3D point cloud segmentation has made tremendous progress in recent years. Most current methods focus on aggregating local features, but fail to directly model long-range dependencies. In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance. Specifically, we first put forward a novel key sampling strategy. For each query point, we sample nearby points densely and distant points sparsely as its keys in a stratified way, which enables the model to enlarge the effective receptive field and enjoy long-range contexts at a low computational cost. Also, to combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information, which facilitates convergence and boosts performance. Besides, we adopt contextual relative position encoding to adaptively capture position information. Finally, a memory-efficient implementation is introduced to overcome the issue of varying point numbers in each window. Extensive experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets. Code is available at https://github.com/dvlab-research/Stratified-Transformer.
Persistent Identifier	http://hdl.handle.net/10722/333542
ISSN	1063-6919 2023 SCImago Journal Rankings: 10.331
ISI Accession Number ID	WOS:000870759101053

DC Field	Value	Language
dc.contributor.author	Lai, Xin	-
dc.contributor.author	Liu, Jianhui	-
dc.contributor.author	Jiang, Li	-
dc.contributor.author	Wang, Liwei	-
dc.contributor.author	Zhao, Hengshuang	-
dc.contributor.author	Liu, Shu	-
dc.contributor.author	Qi, Xiaojuan	-
dc.contributor.author	Jia, Jiaya	-
dc.date.accessioned	2023-10-06T05:20:19Z	-
dc.date.available	2023-10-06T05:20:19Z	-
dc.date.issued	2022	-
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, v. 2022-June, p. 8490-8499	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/10722/333542	-
dc.description.abstract	3D point cloud segmentation has made tremendous progress in recent years. Most current methods focus on aggregating local features, but fail to directly model long-range dependencies. In this paper, we propose Stratified Transformer that is able to capture long-range contexts and demonstrates strong generalization ability and high performance. Specifically, we first put forward a novel key sampling strategy. For each query point, we sample nearby points densely and distant points sparsely as its keys in a stratified way, which enables the model to enlarge the effective receptive field and enjoy long-range contexts at a low computational cost. Also, to combat the challenges posed by irregular point arrangements, we propose first-layer point embedding to aggregate local information, which facilitates convergence and boosts performance. Besides, we adopt contextual relative position encoding to adaptively capture position information. Finally, a memory-efficient implementation is introduced to overcome the issue of varying point numbers in each window. Extensive experiments demonstrate the effectiveness and superiority of our method on S3DIS, ScanNetv2 and ShapeNetPart datasets. Code is available at https://github.com/dvlab-research/Stratified-Transformer.	-
dc.language	eng	-
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition	-
dc.subject	3D from multi-view and sensors	-
dc.subject	grouping and shape analysis	-
dc.subject	Scene analysis and understanding	-
dc.subject	Segmentation	-
dc.title	Stratified Transformer for 3D Point Cloud Segmentation	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/CVPR52688.2022.00831	-
dc.identifier.scopus	eid_2-s2.0-85132707307	-
dc.identifier.volume	2022-June	-
dc.identifier.spage	8490	-
dc.identifier.epage	8499	-
dc.identifier.isi	WOS:000870759101053	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Stratified Transformer for 3D Point Cloud Segmentation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats