File Download
Links for fulltext
(May Require Subscription)
- Scopus: eid_2-s2.0-85088779577
- WOS: WOS:000535866902046
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Generalization bounds of stochastic gradient descent for wide and deep neural networks
Title | Generalization bounds of stochastic gradient descent for wide and deep neural networks |
---|---|
Authors | |
Issue Date | 2019 |
Citation | 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020 How to Cite? |
Abstract | We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work. |
Persistent Identifier | http://hdl.handle.net/10722/303679 |
ISSN | 2020 SCImago Journal Rankings: 1.399 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Cao, Yuan | - |
dc.contributor.author | Gu, Quanquan | - |
dc.date.accessioned | 2021-09-15T08:25:48Z | - |
dc.date.available | 2021-09-15T08:25:48Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020 | - |
dc.identifier.issn | 1049-5258 | - |
dc.identifier.uri | http://hdl.handle.net/10722/303679 | - |
dc.description.abstract | We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work. | - |
dc.language | eng | - |
dc.relation.ispartof | Advances in Neural Information Processing Systems 32 (NeurIPS 2019) | - |
dc.title | Generalization bounds of stochastic gradient descent for wide and deep neural networks | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-85088779577 | - |
dc.identifier.isi | WOS:000535866902046 | - |