File Download
Links for fulltext
(May Require Subscription)
- Scopus: eid_2-s2.0-85090172477
- WOS: WOS:000534424302009
- Find via
Supplementary
- Citations:
- Appears in Collections:
Conference Paper: An improved analysis of training over-parameterized deep neural networks
Title | An improved analysis of training over-parameterized deep neural networks |
---|---|
Authors | |
Issue Date | 2019 |
Citation | Advances in Neural Information Processing Systems, 2019, v. 32 How to Cite? |
Abstract | A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work. |
Persistent Identifier | http://hdl.handle.net/10722/316554 |
ISSN | 2020 SCImago Journal Rankings: 1.399 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Zou, Difan | - |
dc.contributor.author | Gu, Quanquan | - |
dc.date.accessioned | 2022-09-14T11:40:44Z | - |
dc.date.available | 2022-09-14T11:40:44Z | - |
dc.date.issued | 2019 | - |
dc.identifier.citation | Advances in Neural Information Processing Systems, 2019, v. 32 | - |
dc.identifier.issn | 1049-5258 | - |
dc.identifier.uri | http://hdl.handle.net/10722/316554 | - |
dc.description.abstract | A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work. | - |
dc.language | eng | - |
dc.relation.ispartof | Advances in Neural Information Processing Systems | - |
dc.title | An improved analysis of training over-parameterized deep neural networks | - |
dc.type | Conference_Paper | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.scopus | eid_2-s2.0-85090172477 | - |
dc.identifier.volume | 32 | - |
dc.identifier.isi | WOS:000534424302009 | - |