File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/CVPR46437.2021.00681
- Scopus: eid_2-s2.0-85117131558
- WOS: WOS:000739917307010
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Title | Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers |
---|---|
Authors | |
Issue Date | 2021 |
Citation | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 6877-6886 How to Cite? |
Abstract | Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission. |
Persistent Identifier | http://hdl.handle.net/10722/333514 |
ISSN | 2023 SCImago Journal Rankings: 10.331 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zheng, Sixiao | - |
dc.contributor.author | Lu, Jiachen | - |
dc.contributor.author | Zhao, Hengshuang | - |
dc.contributor.author | Zhu, Xiatian | - |
dc.contributor.author | Luo, Zekun | - |
dc.contributor.author | Wang, Yabiao | - |
dc.contributor.author | Fu, Yanwei | - |
dc.contributor.author | Feng, Jianfeng | - |
dc.contributor.author | Xiang, Tao | - |
dc.contributor.author | Torr, Philip H.S. | - |
dc.contributor.author | Zhang, Li | - |
dc.date.accessioned | 2023-10-06T05:20:05Z | - |
dc.date.available | 2023-10-06T05:20:05Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 6877-6886 | - |
dc.identifier.issn | 1063-6919 | - |
dc.identifier.uri | http://hdl.handle.net/10722/333514 | - |
dc.description.abstract | Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission. | - |
dc.language | eng | - |
dc.relation.ispartof | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition | - |
dc.title | Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1109/CVPR46437.2021.00681 | - |
dc.identifier.scopus | eid_2-s2.0-85117131558 | - |
dc.identifier.spage | 6877 | - |
dc.identifier.epage | 6886 | - |
dc.identifier.isi | WOS:000739917307010 | - |