File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

TitleStochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions
Authors
KeywordsLearning theory
nonconvex optimization
Polyak-Łojasiewicz condition
stochastic gradient descent (SGD)
Issue Date2020
Citation
IEEE Transactions on Neural Networks and Learning Systems, 2020, v. 31, n. 10, p. 4394-4400 How to Cite?
AbstractStochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.
Persistent Identifierhttp://hdl.handle.net/10722/329652
ISSN
2021 Impact Factor: 14.255
2020 SCImago Journal Rankings: 2.882

 

DC FieldValueLanguage
dc.contributor.authorLei, Yunwen-
dc.contributor.authorHu, Ting-
dc.contributor.authorLi, Guiying-
dc.contributor.authorTang, Ke-
dc.date.accessioned2023-08-09T03:34:21Z-
dc.date.available2023-08-09T03:34:21Z-
dc.date.issued2020-
dc.identifier.citationIEEE Transactions on Neural Networks and Learning Systems, 2020, v. 31, n. 10, p. 4394-4400-
dc.identifier.issn2162-237X-
dc.identifier.urihttp://hdl.handle.net/10722/329652-
dc.description.abstractStochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.-
dc.languageeng-
dc.relation.ispartofIEEE Transactions on Neural Networks and Learning Systems-
dc.subjectLearning theory-
dc.subjectnonconvex optimization-
dc.subjectPolyak-Łojasiewicz condition-
dc.subjectstochastic gradient descent (SGD)-
dc.titleStochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1109/TNNLS.2019.2952219-
dc.identifier.pmid31831449-
dc.identifier.scopuseid_2-s2.0-85092680343-
dc.identifier.volume31-
dc.identifier.issue10-
dc.identifier.spage4394-
dc.identifier.epage4400-
dc.identifier.eissn2162-2388-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats