File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

TitleRethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Authors
Issue Date2021
Citation
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 6877-6886 How to Cite?
AbstractMost recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.
Persistent Identifierhttp://hdl.handle.net/10722/333514
ISSN
2023 SCImago Journal Rankings: 10.331
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorZheng, Sixiao-
dc.contributor.authorLu, Jiachen-
dc.contributor.authorZhao, Hengshuang-
dc.contributor.authorZhu, Xiatian-
dc.contributor.authorLuo, Zekun-
dc.contributor.authorWang, Yabiao-
dc.contributor.authorFu, Yanwei-
dc.contributor.authorFeng, Jianfeng-
dc.contributor.authorXiang, Tao-
dc.contributor.authorTorr, Philip H.S.-
dc.contributor.authorZhang, Li-
dc.date.accessioned2023-10-06T05:20:05Z-
dc.date.available2023-10-06T05:20:05Z-
dc.date.issued2021-
dc.identifier.citationProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, p. 6877-6886-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/10722/333514-
dc.description.abstractMost recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (i.e., without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.-
dc.languageeng-
dc.relation.ispartofProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition-
dc.titleRethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/CVPR46437.2021.00681-
dc.identifier.scopuseid_2-s2.0-85117131558-
dc.identifier.spage6877-
dc.identifier.epage6886-
dc.identifier.isiWOS:000739917307010-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats