File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
  • Find via Find It@HKUL
Supplementary

Article: On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

TitleOn the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Authors
Issue Date16-Mar-2024
PublisherOpenReview.net
Citation
Transactions on Machine Learning Research, 2024 How to Cite?
Abstract

Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.


Persistent Identifierhttp://hdl.handle.net/10722/347192
ISSN

 

DC FieldValueLanguage
dc.contributor.authorZhou, Dongruo-
dc.contributor.authorChen, Jinghui-
dc.contributor.authorCao, Yuan-
dc.contributor.authorYang, Ziyan-
dc.contributor.authorGu, Quanquan-
dc.date.accessioned2024-09-18T00:31:01Z-
dc.date.available2024-09-18T00:31:01Z-
dc.date.issued2024-03-16-
dc.identifier.citationTransactions on Machine Learning Research, 2024-
dc.identifier.issn2835-8856-
dc.identifier.urihttp://hdl.handle.net/10722/347192-
dc.description.abstract<p>Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.<br></p>-
dc.languageeng-
dc.publisherOpenReview.net-
dc.relation.ispartofTransactions on Machine Learning Research-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleOn the Convergence of Adaptive Gradient Methods for Nonconvex Optimization-
dc.typeArticle-
dc.identifier.eissn2835-8856-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats