Learning and generalization in overparameterized neural networks, going beyond two layers

Allen-Zhu, Zeyuan; Li, Yuanzhi; Liang, Yingyu

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85087338191
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Learning and generalization in overparameterized neural networks, going beyond two layers

Title	Learning and generalization in overparameterized neural networks, going beyond two layers
Authors	Allen-Zhu, Zeyuan Li, Yuanzhi Liang, Yingyu
Issue Date	2019
Citation	Advances in Neural Information Processing Systems, 2019, v. 32 How to Cite?
Abstract	The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network, and connect it to the SGD theory of escaping saddle points.
Persistent Identifier	http://hdl.handle.net/10722/341279
ISSN	1049-5258 2020 SCImago Journal Rankings: 1.399

DC Field	Value	Language
dc.contributor.author	Allen-Zhu, Zeyuan	-
dc.contributor.author	Li, Yuanzhi	-
dc.contributor.author	Liang, Yingyu	-
dc.date.accessioned	2024-03-13T08:41:34Z	-
dc.date.available	2024-03-13T08:41:34Z	-
dc.date.issued	2019	-
dc.identifier.citation	Advances in Neural Information Processing Systems, 2019, v. 32	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://hdl.handle.net/10722/341279	-
dc.description.abstract	The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network, and connect it to the SGD theory of escaping saddle points.	-
dc.language	eng	-
dc.relation.ispartof	Advances in Neural Information Processing Systems	-
dc.title	Learning and generalization in overparameterized neural networks, going beyond two layers	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85087338191	-
dc.identifier.volume	32	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Learning and generalization in overparameterized neural networks, going beyond two layers

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats