File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Generalization bounds of stochastic gradient descent for wide and deep neural networks

TitleGeneralization bounds of stochastic gradient descent for wide and deep neural networks
Authors
Issue Date2019
Citation
33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020 How to Cite?
AbstractWe study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.
Persistent Identifierhttp://hdl.handle.net/10722/303679
ISSN
2020 SCImago Journal Rankings: 1.399
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorCao, Yuan-
dc.contributor.authorGu, Quanquan-
dc.date.accessioned2021-09-15T08:25:48Z-
dc.date.available2021-09-15T08:25:48Z-
dc.date.issued2019-
dc.identifier.citation33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020-
dc.identifier.issn1049-5258-
dc.identifier.urihttp://hdl.handle.net/10722/303679-
dc.description.abstractWe study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.-
dc.languageeng-
dc.relation.ispartofAdvances in Neural Information Processing Systems 32 (NeurIPS 2019)-
dc.titleGeneralization bounds of stochastic gradient descent for wide and deep neural networks-
dc.typeConference_Paper-
dc.description.naturelink_to_OA_fulltext-
dc.identifier.scopuseid_2-s2.0-85088779577-
dc.identifier.isiWOS:000535866902046-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats