File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: The Benefits of Implicit Regularization from SGD in Least Squares Problems

TitleThe Benefits of Implicit Regularization from SGD in Least Squares Problems
Authors
Issue Date2021
Citation
Advances in Neural Information Processing Systems, 2021, v. 7, p. 5456-5468 How to Cite?
AbstractStochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make sharp instance-based comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression. For a broad class of least squares problem instances (that are natural in high-dimensional settings), we show: (1) for every problem instance and for every ridge parameter, (unregularized) SGD, when provided with logarithmically more samples than that provided to the ridge algorithm, generalizes no worse than the ridge solution (provided SGD uses a tuned constant stepsize); (2) conversely, there exist instances (in this wide problem class) where optimally-tuned ridge regression requires quadratically more samples than SGD in order to have the same generalization performance. Taken together, our results show that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances. More generally, our results show how algorithmic regularization has important consequences even in simpler (overparameterized) convex settings.
Persistent Identifierhttp://hdl.handle.net/10722/316659
ISSN
2020 SCImago Journal Rankings: 1.399

 

DC FieldValueLanguage
dc.contributor.authorZou, Difan-
dc.contributor.authorWu, Jingfeng-
dc.contributor.authorBraverman, Vladimir-
dc.contributor.authorGu, Quanquan-
dc.contributor.authorFoster, Dean P.-
dc.contributor.authorKakade, Sham M.-
dc.date.accessioned2022-09-14T11:41:00Z-
dc.date.available2022-09-14T11:41:00Z-
dc.date.issued2021-
dc.identifier.citationAdvances in Neural Information Processing Systems, 2021, v. 7, p. 5456-5468-
dc.identifier.issn1049-5258-
dc.identifier.urihttp://hdl.handle.net/10722/316659-
dc.description.abstractStochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice, which has been hypothesized to play an important role in the generalization of modern machine learning approaches. In this work, we seek to understand these issues in the simpler setting of linear regression (including both underparameterized and overparameterized regimes), where our goal is to make sharp instance-based comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression. For a broad class of least squares problem instances (that are natural in high-dimensional settings), we show: (1) for every problem instance and for every ridge parameter, (unregularized) SGD, when provided with logarithmically more samples than that provided to the ridge algorithm, generalizes no worse than the ridge solution (provided SGD uses a tuned constant stepsize); (2) conversely, there exist instances (in this wide problem class) where optimally-tuned ridge regression requires quadratically more samples than SGD in order to have the same generalization performance. Taken together, our results show that, up to the logarithmic factors, the generalization performance of SGD is always no worse than that of ridge regression in a wide range of overparameterized problems, and, in fact, could be much better for some problem instances. More generally, our results show how algorithmic regularization has important consequences even in simpler (overparameterized) convex settings.-
dc.languageeng-
dc.relation.ispartofAdvances in Neural Information Processing Systems-
dc.titleThe Benefits of Implicit Regularization from SGD in Least Squares Problems-
dc.typeConference_Paper-
dc.description.naturelink_to_OA_fulltext-
dc.identifier.scopuseid_2-s2.0-85131726722-
dc.identifier.volume7-
dc.identifier.spage5456-
dc.identifier.epage5468-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats