Adaptive Perspective Distillation for Semantic Segmentation

Tian, Zhuotao; Chen, Pengguang; Lai, Xin; Jiang, Li; Liu, Shu; Zhao, Hengshuang; Yu, Bei; Yang, Ming Chang; Jia, Jiaya

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TPAMI.2022.3159581
Scopus: eid_2-s2.0-85126511871
PMID: 35294341
WOS: WOS:000912386000003
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Adaptive Perspective Distillation for Semantic Segmentation

Title	Adaptive Perspective Distillation for Semantic Segmentation
Authors	Tian, Zhuotao Chen, Pengguang Lai, Xin Jiang, Li Liu, Shu Zhao, Hengshuang Yu, Bei Yang, Ming Chang Jia, Jiaya
Keywords	Knowledge distillation scene understanding semantic segmentation
Issue Date	2023
Citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 2, p. 1372-1387 How to Cite? DOI: http://dx.doi.org/10.1109/TPAMI.2022.3159581
Abstract	Strong semantic segmentation models require large backbones to achieve promising performance, making it hard to adapt to real applications where effective real-time algorithms are needed. Knowledge distillation tackles this issue by letting the smaller model (student) produce similar pixel-wise predictions to that of a larger model (teacher). However, the classifier, which can be deemed as the perspective by which models perceive the encoded features for yielding observations (i.e., predictions), is shared by all training samples, fitting a universal feature distribution. Since good generalization to the entire distribution may bring the inferior specification to individual samples with a certain capacity, the shared universal perspective often overlooks details existing in each sample, causing degradation of knowledge distillation. In this paper, we propose Adaptive Perspective Distillation (APD) that creates an adaptive local perspective for each individual training sample. It extracts detailed contextual information from each training sample specifically, mining more details from the teacher and thus achieving better knowledge distillation results on the student. APD has no structural constraints to both teacher and student models, thus generalizing well to different semantic segmentation models. Extensive experiments on Cityscapes, ADE20K, and PASCAL-Context manifest the effectiveness of our proposed APD. Besides, APD can yield favorable performance gain to the models in both object detection and instance segmentation without bells and whistles.
Persistent Identifier	http://hdl.handle.net/10722/332255
ISSN	0162-8828 2023 Impact Factor: 20.8 2023 SCImago Journal Rankings: 6.158
ISI Accession Number ID	WOS:000912386000003

DC Field	Value	Language
dc.contributor.author	Tian, Zhuotao	-
dc.contributor.author	Chen, Pengguang	-
dc.contributor.author	Lai, Xin	-
dc.contributor.author	Jiang, Li	-
dc.contributor.author	Liu, Shu	-
dc.contributor.author	Zhao, Hengshuang	-
dc.contributor.author	Yu, Bei	-
dc.contributor.author	Yang, Ming Chang	-
dc.contributor.author	Jia, Jiaya	-
dc.date.accessioned	2023-10-06T05:10:04Z	-
dc.date.available	2023-10-06T05:10:04Z	-
dc.date.issued	2023	-
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, v. 45, n. 2, p. 1372-1387	-
dc.identifier.issn	0162-8828	-
dc.identifier.uri	http://hdl.handle.net/10722/332255	-
dc.description.abstract	Strong semantic segmentation models require large backbones to achieve promising performance, making it hard to adapt to real applications where effective real-time algorithms are needed. Knowledge distillation tackles this issue by letting the smaller model (student) produce similar pixel-wise predictions to that of a larger model (teacher). However, the classifier, which can be deemed as the perspective by which models perceive the encoded features for yielding observations (i.e., predictions), is shared by all training samples, fitting a universal feature distribution. Since good generalization to the entire distribution may bring the inferior specification to individual samples with a certain capacity, the shared universal perspective often overlooks details existing in each sample, causing degradation of knowledge distillation. In this paper, we propose Adaptive Perspective Distillation (APD) that creates an adaptive local perspective for each individual training sample. It extracts detailed contextual information from each training sample specifically, mining more details from the teacher and thus achieving better knowledge distillation results on the student. APD has no structural constraints to both teacher and student models, thus generalizing well to different semantic segmentation models. Extensive experiments on Cityscapes, ADE20K, and PASCAL-Context manifest the effectiveness of our proposed APD. Besides, APD can yield favorable performance gain to the models in both object detection and instance segmentation without bells and whistles.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	-
dc.subject	Knowledge distillation	-
dc.subject	scene understanding	-
dc.subject	semantic segmentation	-
dc.title	Adaptive Perspective Distillation for Semantic Segmentation	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TPAMI.2022.3159581	-
dc.identifier.pmid	35294341	-
dc.identifier.scopus	eid_2-s2.0-85126511871	-
dc.identifier.volume	45	-
dc.identifier.issue	2	-
dc.identifier.spage	1372	-
dc.identifier.epage	1387	-
dc.identifier.eissn	1939-3539	-
dc.identifier.isi	WOS:000912386000003	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Adaptive Perspective Distillation for Semantic Segmentation

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats