Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Lei, Yunwen; Hu, Ting; Li, Guiying; Tang, Ke

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1109/TNNLS.2019.2952219
Scopus: eid_2-s2.0-85092680343
PMID: 31831449
WOS: WOS:000576436600052
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Mathematics: Journal/Magazine Articles

Article: Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Title	Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions
Authors	Lei, Yunwen Hu, Ting Li, Guiying Tang, Ke
Keywords	Learning theory nonconvex optimization Polyak-Łojasiewicz condition stochastic gradient descent (SGD)
Issue Date	2020
Citation	IEEE Transactions on Neural Networks and Learning Systems, 2020, v. 31, n. 10, p. 4394-4400 How to Cite? DOI: http://dx.doi.org/10.1109/TNNLS.2019.2952219
Abstract	Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.
Persistent Identifier	http://hdl.handle.net/10722/329652
ISSN	2162-237X 2023 Impact Factor: 10.2 2023 SCImago Journal Rankings: 4.170
ISI Accession Number ID	WOS:000576436600052

DC Field	Value	Language
dc.contributor.author	Lei, Yunwen	-
dc.contributor.author	Hu, Ting	-
dc.contributor.author	Li, Guiying	-
dc.contributor.author	Tang, Ke	-
dc.date.accessioned	2023-08-09T03:34:21Z	-
dc.date.available	2023-08-09T03:34:21Z	-
dc.date.issued	2020	-
dc.identifier.citation	IEEE Transactions on Neural Networks and Learning Systems, 2020, v. 31, n. 10, p. 4394-4400	-
dc.identifier.issn	2162-237X	-
dc.identifier.uri	http://hdl.handle.net/10722/329652	-
dc.description.abstract	Stochastic gradient descent (SGD) is a popular and efficient method with wide applications in training deep neural nets and other nonconvex models. While the behavior of SGD is well understood in the convex learning setting, the existing theoretical results for SGD applied to nonconvex objective functions are far from mature. For example, existing results require to impose a nontrivial assumption on the uniform boundedness of gradients for all iterates encountered in the learning process, which is hard to verify in practical implementations. In this article, we establish a rigorous theoretical foundation for SGD in nonconvex learning by showing that this boundedness assumption can be removed without affecting convergence rates, and relaxing the standard smoothness assumption to Hölder continuity of gradients. In particular, we establish sufficient conditions for almost sure convergence as well as optimal convergence rates for SGD applied to both general nonconvex and gradient-dominated objective functions. A linear convergence is further derived in the case with zero variances.	-
dc.language	eng	-
dc.relation.ispartof	IEEE Transactions on Neural Networks and Learning Systems	-
dc.subject	Learning theory	-
dc.subject	nonconvex optimization	-
dc.subject	Polyak-Łojasiewicz condition	-
dc.subject	stochastic gradient descent (SGD)	-
dc.title	Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1109/TNNLS.2019.2952219	-
dc.identifier.pmid	31831449	-
dc.identifier.scopus	eid_2-s2.0-85092680343	-
dc.identifier.volume	31	-
dc.identifier.issue	10	-
dc.identifier.spage	4394	-
dc.identifier.epage	4400	-
dc.identifier.eissn	2162-2388	-
dc.identifier.isi	WOS:000576436600052	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Stochastic Gradient Descent for Nonconvex Learning without Bounded Gradient Assumptions

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats