File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Generalization analysis and regularization in over-parameterized models
Title | Generalization analysis and regularization in over-parameterized models |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Meng, X. [孟徐然]. (2024). Generalization analysis and regularization in over-parameterized models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | We study the success of over parameterized models in both regression and classification tasks. In the regression task, we uncover the phenomenon of multiple descent in random feature models, where the test accuracy follows a curve with multiple descents as the number of model parameters increases. In the classification task, we theoretically establish the capability of two-layer ReLU convolutional neural networks to learn complex XOR data. We find that these networks can achieve the Bayes optimal test accuracy when the data signal-to-noise ratio (SNR) is high. Through our theoretical investigations, we discover that benign overfitting only occurs when the data set has a high SNR. Models trained on low SNR data consistently exhibit poor test performance, indicating harmful overfitting of the training data set.
We also explore two regularization techniques which can address the issue of harmful overfitting in low SNR data sets for over parameterized models. Firstly, we investigate gradient regularization and its role during the training process. Our theoretical analysis reveals that gradient regularization can effectively suppress the memorization of noise within the model. Consequently, the models with gradient regularization exhibit improved performance in signal learning compared to models without this regularization technique. Secondly, we explore the use of early stopping as a regularization technique. By observing the spectra of weight matrices during the training procedure, researchers identify deviations from the Marchenko-Pastur law. We found that these deviations indicate the presence of sufficient training information or potential issues. As a result, we propose a spectra criterion that can guide the early stopping process during training.
Overall, this thesis highlights our investigations into the success of over parameterized models in various learning tasks. We provide insights into the conditions under which these models perform well, and investigate several regularization techniques which can mitigate the harmful overfitting. |
Degree | Doctor of Philosophy |
Subject | Machine learning Data mining |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/345401 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Meng, Xuran | - |
dc.contributor.author | 孟徐然 | - |
dc.date.accessioned | 2024-08-26T08:59:32Z | - |
dc.date.available | 2024-08-26T08:59:32Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Meng, X. [孟徐然]. (2024). Generalization analysis and regularization in over-parameterized models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/345401 | - |
dc.description.abstract | We study the success of over parameterized models in both regression and classification tasks. In the regression task, we uncover the phenomenon of multiple descent in random feature models, where the test accuracy follows a curve with multiple descents as the number of model parameters increases. In the classification task, we theoretically establish the capability of two-layer ReLU convolutional neural networks to learn complex XOR data. We find that these networks can achieve the Bayes optimal test accuracy when the data signal-to-noise ratio (SNR) is high. Through our theoretical investigations, we discover that benign overfitting only occurs when the data set has a high SNR. Models trained on low SNR data consistently exhibit poor test performance, indicating harmful overfitting of the training data set. We also explore two regularization techniques which can address the issue of harmful overfitting in low SNR data sets for over parameterized models. Firstly, we investigate gradient regularization and its role during the training process. Our theoretical analysis reveals that gradient regularization can effectively suppress the memorization of noise within the model. Consequently, the models with gradient regularization exhibit improved performance in signal learning compared to models without this regularization technique. Secondly, we explore the use of early stopping as a regularization technique. By observing the spectra of weight matrices during the training procedure, researchers identify deviations from the Marchenko-Pastur law. We found that these deviations indicate the presence of sufficient training information or potential issues. As a result, we propose a spectra criterion that can guide the early stopping process during training. Overall, this thesis highlights our investigations into the success of over parameterized models in various learning tasks. We provide insights into the conditions under which these models perform well, and investigate several regularization techniques which can mitigate the harmful overfitting. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Machine learning | - |
dc.subject.lcsh | Data mining | - |
dc.title | Generalization analysis and regularization in over-parameterized models | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044843665703414 | - |