File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Variable selection and prediction for local polynomial regression
Title | Variable selection and prediction for local polynomial regression |
---|---|
Authors | |
Advisors | Advisor(s):Lee, SMS |
Issue Date | 2020 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Cheung, K. Y. [張建熠]. (2020). Variable selection and prediction for local polynomial regression. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Local polynomial has been used for nonparametric regression with a long history and is generalised to many other regression models. However, it inescapably suffers from the curse of dimensionality since the size of local neighborhood, which is controlled by bandwidths, decreases exponentially as covariate dimension increases. To solve the problem, we propose a bandwidth regularization scheme to remove irrelevant variables.
We first discuss the proposed procedure in the general nonparametric regression setting with convex loss for high dimensional data. Selection consistency as well as the asymptotic property of the local linear estimator based on the optimised bandwidths are established.
Then, a modified Nadaraya--Watson estimator is proposed for variable selection under a nonparametric setting, in the presence of missing data, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. The method allows for information sharing across different missing patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missing mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness.
The final problem is dimension reduction. The above procedure is extended to bandwidth matrix optimization to perform variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Compared to most existing methods, the new procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. The selected model is guaranteed to have non-inferior convergence rate compared to that of the oracle model. |
Degree | Doctor of Philosophy |
Subject | Regression analysis Nonparametric statistics |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/297522 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Lee, SMS | - |
dc.contributor.author | Cheung, Kin Yap | - |
dc.contributor.author | 張建熠 | - |
dc.date.accessioned | 2021-03-21T11:38:01Z | - |
dc.date.available | 2021-03-21T11:38:01Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Cheung, K. Y. [張建熠]. (2020). Variable selection and prediction for local polynomial regression. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/297522 | - |
dc.description.abstract | Local polynomial has been used for nonparametric regression with a long history and is generalised to many other regression models. However, it inescapably suffers from the curse of dimensionality since the size of local neighborhood, which is controlled by bandwidths, decreases exponentially as covariate dimension increases. To solve the problem, we propose a bandwidth regularization scheme to remove irrelevant variables. We first discuss the proposed procedure in the general nonparametric regression setting with convex loss for high dimensional data. Selection consistency as well as the asymptotic property of the local linear estimator based on the optimised bandwidths are established. Then, a modified Nadaraya--Watson estimator is proposed for variable selection under a nonparametric setting, in the presence of missing data, where a covariate may be missing either because its value is hidden from the observer or because it is inapplicable to the particular subject being observed. The method allows for information sharing across different missing patterns without affecting consistency of the estimator. Unlike other conventional methods such as those based on imputations or likelihoods, our method requires only mild assumptions on the model and the missing mechanism. For prediction we focus on finding relevant variables for predicting mean responses, conditional on covariate vectors subject to a given type of missingness. The final problem is dimension reduction. The above procedure is extended to bandwidth matrix optimization to perform variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Compared to most existing methods, the new procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross validation or principal components. The selected model is guaranteed to have non-inferior convergence rate compared to that of the oracle model. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Regression analysis | - |
dc.subject.lcsh | Nonparametric statistics | - |
dc.title | Variable selection and prediction for local polynomial regression | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2021 | - |
dc.identifier.mmsid | 991044351383703414 | - |