Rethinking bias-variance trade-off for generalization of neural networks

Yang, Zitong; Yu, Yaodong; You, Chong; Steinhardt, Jacob; Ma, Yi

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Scopus: eid_2-s2.0-85105410523

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- HKU Musketeers Foundation Institute of Data Science: Conference papers

Conference Paper: Rethinking bias-variance trade-off for generalization of neural networks

Title	Rethinking bias-variance trade-off for generalization of neural networks
Authors	Yang, Zitong Yu, Yaodong You, Chong Steinhardt, Jacob Ma, Yi
Issue Date	2020
Citation	37th International Conference on Machine Learning, ICML 2020, 2020, v. PartF168147-14, p. 10698-10708 How to Cite?
Abstract	The classical bias-variance trade-off predicts that bias decreases and variance increases with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.
Persistent Identifier	http://hdl.handle.net/10722/327769

DC Field	Value	Language
dc.contributor.author	Yang, Zitong	-
dc.contributor.author	Yu, Yaodong	-
dc.contributor.author	You, Chong	-
dc.contributor.author	Steinhardt, Jacob	-
dc.contributor.author	Ma, Yi	-
dc.date.accessioned	2023-05-08T02:26:41Z	-
dc.date.available	2023-05-08T02:26:41Z	-
dc.date.issued	2020	-
dc.identifier.citation	37th International Conference on Machine Learning, ICML 2020, 2020, v. PartF168147-14, p. 10698-10708	-
dc.identifier.uri	http://hdl.handle.net/10722/327769	-
dc.description.abstract	The classical bias-variance trade-off predicts that bias decreases and variance increases with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.	-
dc.language	eng	-
dc.relation.ispartof	37th International Conference on Machine Learning, ICML 2020	-
dc.title	Rethinking bias-variance trade-off for generalization of neural networks	-
dc.type	Conference_Paper	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.scopus	eid_2-s2.0-85105410523	-
dc.identifier.volume	PartF168147-14	-
dc.identifier.spage	10698	-
dc.identifier.epage	10708	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Conference Paper: Rethinking bias-variance trade-off for generalization of neural networks

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats