File Download
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: Kalman normalization: Normalizing internal representations across network layers
Title | Kalman normalization: Normalizing internal representations across network layers |
---|---|
Authors | |
Issue Date | 2018 |
Publisher | Neural Information Processing Systems Foundation. The conference proceedings' web site is located at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018 |
Citation | Neural Information Processing Systems 2018, Montreal, Canada, 2-8 December 2018. In Advances in Neural Information Processing Systems 31 (NIPS 2018), 2018, p. 21-31 How to Cite? |
Abstract | © 2018 Curran Associates Inc..All rights reserved. As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g.equipping Group Normalization [34] with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017. |
Persistent Identifier | http://hdl.handle.net/10722/273747 |
ISSN | 2020 SCImago Journal Rankings: 1.399 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, Guangrun | - |
dc.contributor.author | Peng, Jiefeng | - |
dc.contributor.author | Luo, Ping | - |
dc.contributor.author | Wang, Xinjiang | - |
dc.contributor.author | Lin, Liang | - |
dc.date.accessioned | 2019-08-12T09:56:33Z | - |
dc.date.available | 2019-08-12T09:56:33Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Neural Information Processing Systems 2018, Montreal, Canada, 2-8 December 2018. In Advances in Neural Information Processing Systems 31 (NIPS 2018), 2018, p. 21-31 | - |
dc.identifier.issn | 1049-5258 | - |
dc.identifier.uri | http://hdl.handle.net/10722/273747 | - |
dc.description.abstract | © 2018 Curran Associates Inc..All rights reserved. As an indispensable component, Batch Normalization (BN) has successfully improved the training of deep neural networks (DNNs) with mini-batches, by normalizing the distribution of the internal representation for each hidden layer. However, the effectiveness of BN would diminish with the scenario of micro-batch (e.g. less than 4 samples in a mini-batch), since the estimated statistics in a mini-batch are not reliable with insufficient samples. This limits BN's room in training larger models on segmentation, detection, and video-related problems, which require small batches constrained by memory consumption. In this paper, we present a novel normalization method, called Kalman Normalization (KN), for improving and accelerating the training of DNNs, particularly under the context of micro-batches. Specifically, unlike the existing solutions treating each hidden layer as an isolated system, KN treats all the layers in a network as a whole system, and estimates the statistics of a certain layer by considering the distributions of all its preceding layers, mimicking the merits of Kalman Filtering. On ResNet50 trained in ImageNet, KN has 3.4% lower error than its BN counterpart when using a batch size of 4; Even when using typical batch sizes, KN still maintains an advantage over BN while other BN variants suffer a performance degradation. Moreover, KN can be naturally generalized to many existing normalization variants to obtain gains, e.g.equipping Group Normalization [34] with Group Kalman Normalization (GKN). KN can outperform BN and its variants for large scale object detection and segmentation task in COCO 2017. | - |
dc.language | eng | - |
dc.publisher | Neural Information Processing Systems Foundation. The conference proceedings' web site is located at https://papers.nips.cc/book/advances-in-neural-information-processing-systems-31-2018 | - |
dc.relation.ispartof | Advances in Neural Information Processing Systems 31 (NIPS 2018) | - |
dc.title | Kalman normalization: Normalizing internal representations across network layers | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-85064819672 | - |
dc.identifier.spage | 21 | - |
dc.identifier.epage | 31 | - |
dc.identifier.issnl | 1049-5258 | - |