File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
  • Find via Find It@HKUL
Supplementary

Article: Generalization performance of multi-pass stochastic gradient descent with convex loss functions

TitleGeneralization performance of multi-pass stochastic gradient descent with convex loss functions
Authors
Issue Date31-Jan-2021
PublisherJournal of Machine Learning Research
Citation
Journal of Machine Learning Research, 2021, v. 22, n. 25, p. 1-41 How to Cite?
AbstractStochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes.
Persistent Identifierhttp://hdl.handle.net/10722/337191
ISSN
2021 Impact Factor: 5.177
2020 SCImago Journal Rankings: 1.240

 

DC FieldValueLanguage
dc.contributor.authorLei, Y-
dc.contributor.authorHu, T-
dc.contributor.authorTang, K-
dc.date.accessioned2024-03-11T10:18:48Z-
dc.date.available2024-03-11T10:18:48Z-
dc.date.issued2021-01-31-
dc.identifier.citationJournal of Machine Learning Research, 2021, v. 22, n. 25, p. 1-41-
dc.identifier.issn1532-4435-
dc.identifier.urihttp://hdl.handle.net/10722/337191-
dc.description.abstractStochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes.-
dc.languageeng-
dc.publisherJournal of Machine Learning Research-
dc.relation.ispartofJournal of Machine Learning Research-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleGeneralization performance of multi-pass stochastic gradient descent with convex loss functions-
dc.typeArticle-
dc.identifier.volume22-
dc.identifier.issue25-
dc.identifier.spage1-
dc.identifier.epage41-
dc.identifier.eissn1533-7928-
dc.identifier.issnl1532-4435-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats