Panoptic segformer: Delving deeper into panoptic segmentation with transformers

Li, Z; Wang, W; Xie, E; Yu, Z; Anandkumar, A; Alvarez, JM; Luo, P; Lu, T

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Panoptic segformer: Delving deeper into panoptic segmentation with transformers

Title	Panoptic segformer: Delving deeper into panoptic segmentation with transformers
Authors	Li, Z Wang, W Xie, E Yu, Z Anandkumar, A Alvarez, JM Luo, P Lu, T
Issue Date	2022
Publisher	IEEE.
Citation	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Hybrid), New Orleans, Louisiana, USA, 19-24, 2022. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, p. 1280-1289 How to Cite?
Abstract	Panoptic segmentation involves a combination of joint semantic segmentation and instance segmentation, where image contents are divided into two types: things and stuff. We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. It contains three innovative components: an efficient deeply-supervised mask decoder, a query decoupling strategy, and an improved post-processing method. We also use Deformable DETR to efficiently process multi-scale features, which is a fast and efficient version of DETR. Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner. This deep supervision strategy lets the attention modules quickly focus on meaningful semantic regions. It improves performance and reduces the number of required training epochs by half compared to Deformable DETR. Our query decoupling strategy decouples the responsibilities of the query set and avoids mutual interference between things and stuff. In addition, our post-processing strategy improves performance without additional costs by jointly considering classification and segmentation qualities to resolve conflicting mask overlaps. Our approach increases the accuracy 6.2\% PQ over the baseline DETR model. Panoptic SegFormer achieves state-of-the-art results on COCO test-dev with 56.2\% PQ. It also shows stronger zero-shot robustness over existing methods.
Persistent Identifier	http://hdl.handle.net/10722/315795

DC Field	Value	Language
dc.contributor.author	Li, Z	-
dc.contributor.author	Wang, W	-
dc.contributor.author	Xie, E	-
dc.contributor.author	Yu, Z	-
dc.contributor.author	Anandkumar, A	-
dc.contributor.author	Alvarez, JM	-
dc.contributor.author	Luo, P	-
dc.contributor.author	Lu, T	-
dc.date.accessioned	2022-08-19T09:04:34Z	-
dc.date.available	2022-08-19T09:04:34Z	-
dc.date.issued	2022	-
dc.identifier.citation	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Hybrid), New Orleans, Louisiana, USA, 19-24, 2022. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, p. 1280-1289	-
dc.identifier.uri	http://hdl.handle.net/10722/315795	-
dc.description.abstract	Panoptic segmentation involves a combination of joint semantic segmentation and instance segmentation, where image contents are divided into two types: things and stuff. We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. It contains three innovative components: an efficient deeply-supervised mask decoder, a query decoupling strategy, and an improved post-processing method. We also use Deformable DETR to efficiently process multi-scale features, which is a fast and efficient version of DETR. Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner. This deep supervision strategy lets the attention modules quickly focus on meaningful semantic regions. It improves performance and reduces the number of required training epochs by half compared to Deformable DETR. Our query decoupling strategy decouples the responsibilities of the query set and avoids mutual interference between things and stuff. In addition, our post-processing strategy improves performance without additional costs by jointly considering classification and segmentation qualities to resolve conflicting mask overlaps. Our approach increases the accuracy 6.2\% PQ over the baseline DETR model. Panoptic SegFormer achieves state-of-the-art results on COCO test-dev with 56.2\% PQ. It also shows stronger zero-shot robustness over existing methods.	-
dc.language	eng	-
dc.publisher	IEEE.	-
dc.relation.ispartof	Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022	-
dc.rights	Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. Copyright © IEEE.	-
dc.title	Panoptic segformer: Delving deeper into panoptic segmentation with transformers	-
dc.type	Conference_Paper	-
dc.identifier.email	Li, Z: lzq@smail.nju.edu.cn	-
dc.identifier.email	Wang, W: wangwenhai@pjlab.org.cn	-
dc.identifier.email	Yu, Z: zhidingy@nvidia.com	-
dc.identifier.email	Anandkumar, A: aanandkumar@nvidia.com	-
dc.identifier.email	Alvarez, JM: josea@nvidia.com	-
dc.identifier.email	Luo, P: pluo@hku.hk	-
dc.identifier.email	Lu, T: lutong@nju.edu.cn	-
dc.identifier.authority	Luo, P=rp02575	-
dc.identifier.hkuros	335571	-
dc.identifier.spage	1280	-
dc.identifier.epage	1289	-
dc.publisher.place	United States	-

File Download

Supplementary

Conference Paper: Panoptic segformer: Delving deeper into panoptic segmentation with transformers

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats