On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Zhou, Dongruo; Chen, Jinghui; Cao, Yuan; Yang, Ziyan; Gu, Quanquan

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Find via

Supplementary

Citations:
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Title	On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Authors	Zhou, Dongruo Chen, Jinghui Cao, Yuan Yang, Ziyan Gu, Quanquan
Issue Date	16-Mar-2024
Publisher	OpenReview.net
Citation	Transactions on Machine Learning Research, 2024 How to Cite?
Abstract	Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.
Persistent Identifier	http://hdl.handle.net/10722/347192
ISSN	2835-8856

DC Field	Value	Language
dc.contributor.author	Zhou, Dongruo	-
dc.contributor.author	Chen, Jinghui	-
dc.contributor.author	Cao, Yuan	-
dc.contributor.author	Yang, Ziyan	-
dc.contributor.author	Gu, Quanquan	-
dc.date.accessioned	2024-09-18T00:31:01Z	-
dc.date.available	2024-09-18T00:31:01Z	-
dc.date.issued	2024-03-16	-
dc.identifier.citation	Transactions on Machine Learning Research, 2024	-
dc.identifier.issn	2835-8856	-
dc.identifier.uri	http://hdl.handle.net/10722/347192	-
dc.description.abstract	<p>Adaptive gradient methods are workhorses in deep learning. However, the convergence guarantees of adaptive gradient methods for nonconvex optimization have not been thoroughly studied. In this paper, we provide a fine-grained convergence analysis for a general class of adaptive gradient methods including AMSGrad, RMSProp and AdaGrad. For smooth nonconvex functions, we prove that adaptive gradient methods in expectation converge to a first-order stationary point. Our convergence rate is better than existing results for adaptive gradient methods in terms of dimension. In addition, we also prove high probability bounds on the convergence rates of AMSGrad, RMSProp as well as AdaGrad, which have not been established before. Our analyses shed light on better understanding the mechanism behind adaptive gradient methods in optimizing nonconvex objectives.<br></p>	-
dc.language	eng	-
dc.publisher	OpenReview.net	-
dc.relation.ispartof	Transactions on Machine Learning Research	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.title	On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization	-
dc.type	Article	-
dc.identifier.eissn	2835-8856	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats