Generalization bounds of stochastic gradient descent for wide and deep neural networks

Cao, Yuan; Gu, Quanquan

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85088779577
WOS: WOS:000535866902046
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Statistics & Actuarial Science: Conference papers

Conference Paper: Generalization bounds of stochastic gradient descent for wide and deep neural networks

Title	Generalization bounds of stochastic gradient descent for wide and deep neural networks
Authors	Cao, Yuan Gu, Quanquan
Issue Date	2019
Citation	33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020 How to Cite?
Abstract	We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.
Persistent Identifier	http://hdl.handle.net/10722/303679
ISSN	1049-5258 2020 SCImago Journal Rankings: 1.399
ISI Accession Number ID	WOS:000535866902046

DC Field	Value	Language
dc.contributor.author	Cao, Yuan	-
dc.contributor.author	Gu, Quanquan	-
dc.date.accessioned	2021-09-15T08:25:48Z	-
dc.date.available	2021-09-15T08:25:48Z	-
dc.date.issued	2019	-
dc.identifier.citation	33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8-14 Decemeber 2019. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), 2020	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://hdl.handle.net/10722/303679	-
dc.description.abstract	We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected 0-1 loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of Orpn´1{2q that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.	-
dc.language	eng	-
dc.relation.ispartof	Advances in Neural Information Processing Systems 32 (NeurIPS 2019)	-
dc.title	Generalization bounds of stochastic gradient descent for wide and deep neural networks	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85088779577	-
dc.identifier.isi	WOS:000535866902046	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Generalization bounds of stochastic gradient descent for wide and deep neural networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats