File Download
Links for fulltext
(May Require Subscription)
- Scopus: eid_2-s2.0-85088779577
- WOS: WOS:000535866902046
- Find via

Supplementary
- Citations:
- Appears in Collections:
Conference Paper: Generalization bounds of stochastic gradient descent for wide and deep neural networks
| Title | Generalization bounds of stochastic gradient descent for wide and deep neural networks |
|---|---|
| Authors | |
| Issue Date | 2019 |
| Citation | 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020 How to Cite? |
| Abstract | We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work. |
| Persistent Identifier | http://hdl.handle.net/10722/303679 |
| ISSN | 2020 SCImago Journal Rankings: 1.399 |
| ISI Accession Number ID |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Cao, Yuan | - |
| dc.contributor.author | Gu, Quanquan | - |
| dc.date.accessioned | 2021-09-15T08:25:48Z | - |
| dc.date.available | 2021-09-15T08:25:48Z | - |
| dc.date.issued | 2019 | - |
| dc.identifier.citation | 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020 | - |
| dc.identifier.issn | 1049-5258 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/303679 | - |
| dc.description.abstract | We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work. | - |
| dc.language | eng | - |
| dc.relation.ispartof | Advances in Neural Information Processing Systems 32 (NeurIPS 2019) | - |
| dc.title | Generalization bounds of stochastic gradient descent for wide and deep neural networks | - |
| dc.type | Conference_Paper | - |
| dc.description.nature | link_to_OA_fulltext | - |
| dc.identifier.scopus | eid_2-s2.0-85088779577 | - |
| dc.identifier.isi | WOS:000535866902046 | - |
