Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Zou, Difan; Wu, Jingfeng; Braverman, Vladimir; Gu, Quanquan; Kakade, Sham M

File Download

There are no files associated with this item.

Supplementary

Citations:
Appears in Collections:
- Computer Science: Conference papers

Conference Paper: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Title	Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime
Authors	Zou, Difan Wu, Jingfeng Braverman, Vladimir Gu, Quanquan Kakade, Sham M
Issue Date	12-Dec-2022
Abstract	Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which may be pessimistic to explain the superior generalization ability for some particular problem instance. The goal of this paper is to sharply characterize the generalization of multi-pass SGD, by developing an instance-dependent excess risk bound for least squares in the interpolation regime, which is expressed as a function of the iteration number, stepsize, and data covariance. We show that the excess risk of SGD can be exactly decomposed into the excess risk of GD and a positive fluctuation error, suggesting that SGD always performs worse, instance-wisely, than GD, in generalization. On the other hand, we show that although SGD needs more iterations than GD to achieve the same level of excess risk, it saves the number of stochastic gradient evaluations, and therefore is preferable in terms of computational time.
Persistent Identifier	http://hdl.handle.net/10722/339354

DC Field	Value	Language
dc.contributor.author	Zou, Difan	-
dc.contributor.author	Wu, Jingfeng	-
dc.contributor.author	Braverman, Vladimir	-
dc.contributor.author	Gu, Quanquan	-
dc.contributor.author	Kakade, Sham M	-
dc.date.accessioned	2024-03-11T10:35:56Z	-
dc.date.available	2024-03-11T10:35:56Z	-
dc.date.issued	2022-12-12	-
dc.identifier.uri	http://hdl.handle.net/10722/339354	-
dc.description.abstract	<p>Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which may be pessimistic to explain the superior generalization ability for some particular problem instance. The goal of this paper is to sharply characterize the generalization of multi-pass SGD, by developing an instance-dependent excess risk bound for least squares in the interpolation regime, which is expressed as a function of the iteration number, stepsize, and data covariance. We show that the excess risk of SGD can be exactly decomposed into the excess risk of GD and a positive fluctuation error, suggesting that SGD always performs worse, instance-wisely, than GD, in generalization. On the other hand, we show that although SGD needs more iterations than GD to achieve the same level of excess risk, it saves the number of stochastic gradient evaluations, and therefore is preferable in terms of computational time.</p>	-
dc.language	eng	-
dc.relation.ispartof	Advances in Neural Information Processing Systems (28/11/2022-09/12/2022, New Orleans)	-
dc.title	Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime	-
dc.type	Conference_Paper	-

File Download

Supplementary

Conference Paper: Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats