High-level Feature Guided Decoding for Semantic Segmentation

Huang, Ye; Kang, Di; Gao, Shenghua; Li, Wen; Duan, Lixin

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TCSVT.2024.3393632
Scopus: eid_2-s2.0-85191715595
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: High-level Feature Guided Decoding for Semantic Segmentation

Title	High-level Feature Guided Decoding for Semantic Segmentation
Authors	Huang, Ye Kang, Di Gao, Shenghua Li, Wen Duan, Lixin
Keywords	Circuits and systems Cityscapes Decoding Feature extraction Representation Learning Semantic segmentation Semantic Segmentation Spatial resolution Task analysis Training
Issue Date	2024
Citation	IEEE Transactions on Circuits and Systems for Video Technology, 2024 How to Cite? DOI: http://dx.doi.org/10.1109/TCSVT.2024.3393632
Abstract	Existing pyramid-based upsamplers (e.g. SemanticFPN), although efficient, usually produce less accurate results compared to dilation-based models when using the same backbone. This is partially caused by the contaminated high-level features since they are fused and fine-tuned with noisy low-level features on limited data. To address this issue, we propose to use powerful pre-trained high-level features as guidance (HFG) so that the upsampler can produce robust results. Specifically, only the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification, guiding the upsampler features to more discriminative backbone features. One crucial design of the HFG is to protect the high-level features from being contaminated by using proper stop-gradient operations so that the backbone does not update according to the noisy gradient from the upsampler. To push the upper limit of HFG, we introduce a context augmentation encoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature, resulting in improved representation and thus better guidance. We named our complete solution as the High-Level Features Guided Decoder (HFGD). We evaluate the proposed HFGD on three benchmarks: Pascal Context, COCOStuff164k, and Cityscapes. HFGD achieves state-of-the-art results among methods that do not use extra training data, demonstrating its effectiveness and generalization ability.
Persistent Identifier	http://hdl.handle.net/10722/345383
ISSN	1051-8215 2023 Impact Factor: 8.3 2023 SCImago Journal Rankings: 2.299

DC Field	Value	Language
dc.contributor.author	Huang, Ye	-
dc.contributor.author	Kang, Di	-
dc.contributor.author	Gao, Shenghua	-
dc.contributor.author	Li, Wen	-
dc.contributor.author	Duan, Lixin	-
dc.date.accessioned	2024-08-15T09:27:00Z	-
dc.date.available	2024-08-15T09:27:00Z	-
dc.date.issued	2024	-
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2024	-
dc.identifier.issn	1051-8215	-
dc.identifier.uri	http://hdl.handle.net/10722/345383	-
dc.description.abstract	Existing pyramid-based upsamplers (<italic>e.g</italic>. SemanticFPN), although efficient, usually produce less accurate results compared to dilation-based models when using the same backbone. This is partially caused by the <italic>contaminated</italic> high-level features since they are fused and fine-tuned with noisy low-level features on limited data. To address this issue, we propose to use powerful pre-trained <italic>h</italic>igh-level <italic>f</italic>eatures as <italic>g</italic>uidance (HFG) so that the upsampler can produce robust results. Specifically, <italic>only</italic> the high-level features from the backbone are used to train the class tokens, which are then reused by the upsampler for classification, guiding the upsampler features to more discriminative backbone features. One crucial design of the HFG is to protect the high-level features from being contaminated by using proper stop-gradient operations so that the backbone does not update according to the noisy gradient from the upsampler. To push the upper limit of HFG, we introduce a <italic>c</italic>ontext <italic>a</italic>ugmentation <italic>e</italic>ncoder (CAE) that can efficiently and effectively operate on the low-resolution high-level feature, resulting in improved representation and thus better guidance. We named our complete solution as the High-Level Features Guided Decoder (HFGD). We evaluate the proposed HFGD on three benchmarks: Pascal Context, COCOStuff164k, and Cityscapes. HFGD achieves state-of-the-art results among methods that do not use extra training data, demonstrating its effectiveness and generalization ability.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology	-
dc.subject	Circuits and systems	-
dc.subject	Cityscapes	-
dc.subject	Decoding	-
dc.subject	Feature extraction	-
dc.subject	Representation Learning	-
dc.subject	Semantic segmentation	-
dc.subject	Semantic Segmentation	-
dc.subject	Spatial resolution	-
dc.subject	Task analysis	-
dc.subject	Training	-
dc.title	High-level Feature Guided Decoding for Semantic Segmentation	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TCSVT.2024.3393632	-
dc.identifier.scopus	eid_2-s2.0-85191715595	-
dc.identifier.eissn	1558-2205	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: High-level Feature Guided Decoding for Semantic Segmentation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats