File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Kalman normalization: Normalizing internal representations across network layers

TitleKalman normalization: Normalizing internal representations across network layers
Authors
Issue Date2018
PublisherNeural Information Processing Systems Foundation. The conference proceedings' web site is located at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018
Citation
Neural Information Processing Systems 2018, Montreal, Canada, 2-8 December 2018. In Advances in Neural Information Processing Systems 31 (NIPS 2018), 2018, p. 21-31 How to Cite?
Abstract© 2018 Curran Associates Inc..All rights reserved. As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g.equipping Group Normalization [34] with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017.
Persistent Identifierhttp://hdl.handle.net/10722/273747
ISSN
2020 SCImago Journal Rankings: 1.399

 

DC FieldValueLanguage
dc.contributor.authorWang, Guangrun-
dc.contributor.authorPeng, Jiefeng-
dc.contributor.authorLuo, Ping-
dc.contributor.authorWang, Xinjiang-
dc.contributor.authorLin, Liang-
dc.date.accessioned2019-08-12T09:56:33Z-
dc.date.available2019-08-12T09:56:33Z-
dc.date.issued2018-
dc.identifier.citationNeural Information Processing Systems 2018, Montreal, Canada, 2-8 December 2018. In Advances in Neural Information Processing Systems 31 (NIPS 2018), 2018, p. 21-31-
dc.identifier.issn1049-5258-
dc.identifier.urihttp://hdl.handle.net/10722/273747-
dc.description.abstract© 2018 Curran Associates Inc..All rights reserved. As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g.equipping Group Normalization [34] with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017.-
dc.languageeng-
dc.publisherNeural Information Processing Systems Foundation. The conference proceedings' web site is located at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018-
dc.relation.ispartofAdvances in Neural Information Processing Systems 31 (NIPS 2018)-
dc.titleKalman normalization: Normalizing internal representations across network layers-
dc.typeConference_Paper-
dc.description.naturelink_to_OA_fulltext-
dc.identifier.scopuseid_2-s2.0-85064819672-
dc.identifier.spage21-
dc.identifier.epage31-
dc.identifier.issnl1049-5258-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats