File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: Learning overparameterized neural networks via stochastic gradient descent on structured data

TitleLearning overparameterized neural networks via stochastic gradient descent on structured data
Authors
Issue Date2018
Citation
Advances in Neural Information Processing Systems, 2018, v. 2018-December, p. 8157-8166 How to Cite?
AbstractNeural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.
Persistent Identifierhttp://hdl.handle.net/10722/341245
ISSN
2020 SCImago Journal Rankings: 1.399

 

DC FieldValueLanguage
dc.contributor.authorLi, Yuanzhi-
dc.contributor.authorLiang, Yingyu-
dc.date.accessioned2024-03-13T08:41:18Z-
dc.date.available2024-03-13T08:41:18Z-
dc.date.issued2018-
dc.identifier.citationAdvances in Neural Information Processing Systems, 2018, v. 2018-December, p. 8157-8166-
dc.identifier.issn1049-5258-
dc.identifier.urihttp://hdl.handle.net/10722/341245-
dc.description.abstractNeural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.-
dc.languageeng-
dc.relation.ispartofAdvances in Neural Information Processing Systems-
dc.titleLearning overparameterized neural networks via stochastic gradient descent on structured data-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-85064818888-
dc.identifier.volume2018-December-
dc.identifier.spage8157-
dc.identifier.epage8166-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats