An improved analysis of training over-parameterized deep neural networks

Zou, Difan; Gu, Quanquan

File Download

re01.htm

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85090172477
WOS: WOS:000534424302009
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: An improved analysis of training over-parameterized deep neural networks

Title	An improved analysis of training over-parameterized deep neural networks
Authors	Zou, Difan Gu, Quanquan
Issue Date	2019
Citation	Advances in Neural Information Processing Systems, 2019, v. 32 How to Cite?
Abstract	A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.
Persistent Identifier	http://hdl.handle.net/10722/316554
ISSN	1049-5258 2020 SCImago Journal Rankings: 1.399
ISI Accession Number ID	WOS:000534424302009

DC Field	Value	Language
dc.contributor.author	Zou, Difan	-
dc.contributor.author	Gu, Quanquan	-
dc.date.accessioned	2022-09-14T11:40:44Z	-
dc.date.available	2022-09-14T11:40:44Z	-
dc.date.issued	2019	-
dc.identifier.citation	Advances in Neural Information Processing Systems, 2019, v. 32	-
dc.identifier.issn	1049-5258	-
dc.identifier.uri	http://hdl.handle.net/10722/316554	-
dc.description.abstract	A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.	-
dc.language	eng	-
dc.relation.ispartof	Advances in Neural Information Processing Systems	-
dc.title	An improved analysis of training over-parameterized deep neural networks	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_OA_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85090172477	-
dc.identifier.volume	32	-
dc.identifier.isi	WOS:000534424302009	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: An improved analysis of training over-parameterized deep neural networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats