Kalman normalization: Normalizing internal representations across network layers

Wang, Guangrun; Peng, Jiefeng; Luo, Ping; Wang, Xinjiang; Lin, Liang

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85064819672
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Kalman normalization: Normalizing internal representations across network layers

Title	Kalman normalization: Normalizing internal representations across network layers
Authors	Wang, Guangrun Peng, Jiefeng Luo, Ping Wang, Xinjiang Lin, Liang
Issue Date	2018
Publisher	Neural Information Processing Systems Foundation. The conference proceedings' web site is located at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018
Citation	Neural Information Processing Systems 2018, Montreal, Canada, 2-8 December 2018. In Advances in Neural Information Processing Systems 31 (NIPS 2018), 2018, p. 21-31 How to Cite?
Abstract	© 2018 Curran Associates Inc..All rights reserved. As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g.equipping Group Normalization [34] with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017.
Persistent Identifier	http://hdl.handle.net/10722/273747
ISSN	1049-5258 2020 SCImago Journal Rankings: 1.399

DC Field	Value	Language
dc.contributor.author	Wang, Guangrun	-
dc.contributor.author	Peng, Jiefeng	-
dc.contributor.author	Luo, Ping	-
dc.contributor.author	Wang, Xinjiang	-
dc.contributor.author	Lin, Liang	-
dc.date.accessioned	2019-08-12T09:56:33Z	-
dc.date.available	2019-08-12T09:56:33Z	-
dc.date.issued	2018	-
dc.identifier.citation	Neural Information Processing Systems 2018, Montreal, Canada, 2-8 December 2018. In Advances in Neural Information Processing Systems 31 (NIPS 2018), 2018, p. 21-31	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://hdl.handle.net/10722/273747	-
dc.description.abstract	© 2018 Curran Associates Inc..All rights reserved. As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g.equipping Group Normalization [34] with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017.	-
dc.language	eng	-
dc.publisher	Neural Information Processing Systems Foundation. The conference proceedings' web site is located at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018	-
dc.relation.ispartof	Advances in Neural Information Processing Systems 31 (NIPS 2018)	-
dc.title	Kalman normalization: Normalizing internal representations across network layers	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85064819672	-
dc.identifier.spage	21	-
dc.identifier.epage	31	-
dc.identifier.issnl	1049-5258	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Kalman normalization: Normalizing internal representations across network layers

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats