File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Dimension reduction via projection and its applications
Title | Dimension reduction via projection and its applications |
---|---|
Authors | |
Advisors | Advisor(s):Li, G |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Si, Y. [斯越峰]. (2024). Dimension reduction via projection and its applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | In the era of big data, high-dimensional data sets are widely encountered in the fields of genomics, time series and machine learning. Projection technique from dimension reduction is efficient to compress number of features and conduct statistical interpretation. With the advancement of modern technology, parametric tensor decomposition and nonparametric angle-based distance are popular projection tools.
In the first part of the thesis, a newly proposed tensor train (TT) decomposition is used to compress parametric subspace of tensor regression. Many existing models for high-dimensional data are based on Tucker decomposition, which has good properties but loses its efficiency in compressing tensors very quickly as the order of tensors increases, say greater than four or five. We propose a modified TT decomposition and then applies it to tensor regression such that a nice statistical interpretation can be obtained. The new tensor regression can well match the data with hierarchical structures. More importantly, the new tensor regression can be easily applied to the case with higher order tensors since TT decomposition can compress the coefficient tensors much more efficiently. The methodology is also extended to tensor autoregression for time series data, and nonasymptotic properties are derived for ordinary least squares estimations of both tensor regression and autoregression. A new algorithm is introduced to search for estimators, and its theoretical justification is also discussed. Theoretical and computational properties of the proposed methodology are verified by simulation studies, and the advantages over existing methods are illustrated by two real examples.
Secondly, a novel projection mean variance (PMV) measure from nonparametric model is used to test the multi-sample hypothesis of equal distributions for univariate or multivariate responses. The proposed PMV measure generalizes the mean variance index using projection technique.
The PMV measure yields an analogous variance component decomposition. Using this decomposition, an ANOVA F statistic is derived to test the multi-sample problem. The proposed test is statistically consistent against general alternatives and robust to heavy-tailed data. The test is free of tuning parameters and does not require moment conditions on the response. Our simulation results demonstrate that the PMV test has higher power than classical Wilks-type methods and DISCO test, especially when the dimension of the response is relatively large or the moment conditions
required by the DISCO test are violated. We further illustrate our method using
empirical analyses of two real data sets.
Lastly, projection quantile correlation (PQC) is proposed to detect quantile dependence between a response and multivariate predictors at a given quantile level. We then use the measure to select grouped predictors that contribute to conditional quantile of the response for high-dimensional data with group structures. Sure independent screening property is established for the group screening method. We illustrate the finite-sample performance of the proposed method through simulations and an application to a data set. |
Degree | Doctor of Philosophy |
Subject | Dimension reduction (Statistics) |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/343765 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Li, G | - |
dc.contributor.author | Si, Yuefeng | - |
dc.contributor.author | 斯越峰 | - |
dc.date.accessioned | 2024-06-06T01:04:49Z | - |
dc.date.available | 2024-06-06T01:04:49Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Si, Y. [斯越峰]. (2024). Dimension reduction via projection and its applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/343765 | - |
dc.description.abstract | In the era of big data, high-dimensional data sets are widely encountered in the fields of genomics, time series and machine learning. Projection technique from dimension reduction is efficient to compress number of features and conduct statistical interpretation. With the advancement of modern technology, parametric tensor decomposition and nonparametric angle-based distance are popular projection tools. In the first part of the thesis, a newly proposed tensor train (TT) decomposition is used to compress parametric subspace of tensor regression. Many existing models for high-dimensional data are based on Tucker decomposition, which has good properties but loses its efficiency in compressing tensors very quickly as the order of tensors increases, say greater than four or five. We propose a modified TT decomposition and then applies it to tensor regression such that a nice statistical interpretation can be obtained. The new tensor regression can well match the data with hierarchical structures. More importantly, the new tensor regression can be easily applied to the case with higher order tensors since TT decomposition can compress the coefficient tensors much more efficiently. The methodology is also extended to tensor autoregression for time series data, and nonasymptotic properties are derived for ordinary least squares estimations of both tensor regression and autoregression. A new algorithm is introduced to search for estimators, and its theoretical justification is also discussed. Theoretical and computational properties of the proposed methodology are verified by simulation studies, and the advantages over existing methods are illustrated by two real examples. Secondly, a novel projection mean variance (PMV) measure from nonparametric model is used to test the multi-sample hypothesis of equal distributions for univariate or multivariate responses. The proposed PMV measure generalizes the mean variance index using projection technique. The PMV measure yields an analogous variance component decomposition. Using this decomposition, an ANOVA F statistic is derived to test the multi-sample problem. The proposed test is statistically consistent against general alternatives and robust to heavy-tailed data. The test is free of tuning parameters and does not require moment conditions on the response. Our simulation results demonstrate that the PMV test has higher power than classical Wilks-type methods and DISCO test, especially when the dimension of the response is relatively large or the moment conditions required by the DISCO test are violated. We further illustrate our method using empirical analyses of two real data sets. Lastly, projection quantile correlation (PQC) is proposed to detect quantile dependence between a response and multivariate predictors at a given quantile level. We then use the measure to select grouped predictors that contribute to conditional quantile of the response for high-dimensional data with group structures. Sure independent screening property is established for the group screening method. We illustrate the finite-sample performance of the proposed method through simulations and an application to a data set. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Dimension reduction (Statistics) | - |
dc.title | Dimension reduction via projection and its applications | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044809205603414 | - |