File Download
Supplementary

postgraduate thesis: Nonparametric and adaptive tests for high-dimensional data

TitleNonparametric and adaptive tests for high-dimensional data
Authors
Advisors
Advisor(s):Xu, JYuen, KC
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Qu, Y. [曲一迪]. (2022). Nonparametric and adaptive tests for high-dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractStudy of population mean is of great interest to researchers especially for high-dimensional multivariate data in recent years. But due to the curse of dimensionality, traditional methods developed for mean testing are rendered useless since the covariance matrix of high-dimensional data is singular and thus not invertible. In this thesis, a two-sample test as well as a K-sample test (K>=3) are proposed for testing high-dimensional mean equality. For the two-sample tests of high-dimensional mean vectors, the classical methods are often particularly designed to test sparse or dense mean differences. However, the sparsity level of mean differences is often unknown. Also, the mean differences can have varying magnitudes, while they are often assumed to be equal in the existing literature. It is reasonable to develop a robust test capable of performing relatively well without the assumption of dense or sparse mean differences and the assumption of equal magnitude for each component. For this purpose, this thesis proposes a new test consisting of two steps: dynamically allocating weights onto components with varying magnitudes and combining multiple weighted component tests (WCTs) to be adaptive to different sparsity levels of mean differences. The proposed adaptive weighted component test (AWCT) can be viewed as a generalization of the generalized component test (GCT) that puts equal weight on each component. Also, the AWCT shares the idea similar to the adaptive sum of powered score (ASPU) test by optimizing the power among a class of tests. The asymptotic properties of the proposed test are studied, and both the simulation studies and real examples demonstrate that the proposed test can achieve an overall good performance for a variety of signal sparsity. As a comparison, existing approaches often cater for a particular situation where signals are either sparse or dense. For the K-sample tests of mean equality especially for the time-ordered data, most existing methods are designed for low-dimensional data and not applicable to the high dimension, low sample size regime. Therefore, this thesis proposes a new K-sample test under the high-dimensional setting. The proposed test relies on neither the normality assumption nor the computationally expensive permutation method. Furthermore, with the utilization of the forward searching algorithm complemented by the adaptive lasso method, the proposed method is also capable of detecting the position and duration of signal(s) robust to the noise accumulated from the high dimensional covariates. The superior performance of the proposed method is illustrated by comparison with other existing methods both in numerical simulation and a real dataset collected from a semi-conductor manufacturing process.
DegreeDoctor of Philosophy
SubjectMathematical statistics
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/324402

 

DC FieldValueLanguage
dc.contributor.advisorXu, J-
dc.contributor.advisorYuen, KC-
dc.contributor.authorQu, Yidi-
dc.contributor.author曲一迪-
dc.date.accessioned2023-02-03T02:11:35Z-
dc.date.available2023-02-03T02:11:35Z-
dc.date.issued2022-
dc.identifier.citationQu, Y. [曲一迪]. (2022). Nonparametric and adaptive tests for high-dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/324402-
dc.description.abstractStudy of population mean is of great interest to researchers especially for high-dimensional multivariate data in recent years. But due to the curse of dimensionality, traditional methods developed for mean testing are rendered useless since the covariance matrix of high-dimensional data is singular and thus not invertible. In this thesis, a two-sample test as well as a K-sample test (K>=3) are proposed for testing high-dimensional mean equality. For the two-sample tests of high-dimensional mean vectors, the classical methods are often particularly designed to test sparse or dense mean differences. However, the sparsity level of mean differences is often unknown. Also, the mean differences can have varying magnitudes, while they are often assumed to be equal in the existing literature. It is reasonable to develop a robust test capable of performing relatively well without the assumption of dense or sparse mean differences and the assumption of equal magnitude for each component. For this purpose, this thesis proposes a new test consisting of two steps: dynamically allocating weights onto components with varying magnitudes and combining multiple weighted component tests (WCTs) to be adaptive to different sparsity levels of mean differences. The proposed adaptive weighted component test (AWCT) can be viewed as a generalization of the generalized component test (GCT) that puts equal weight on each component. Also, the AWCT shares the idea similar to the adaptive sum of powered score (ASPU) test by optimizing the power among a class of tests. The asymptotic properties of the proposed test are studied, and both the simulation studies and real examples demonstrate that the proposed test can achieve an overall good performance for a variety of signal sparsity. As a comparison, existing approaches often cater for a particular situation where signals are either sparse or dense. For the K-sample tests of mean equality especially for the time-ordered data, most existing methods are designed for low-dimensional data and not applicable to the high dimension, low sample size regime. Therefore, this thesis proposes a new K-sample test under the high-dimensional setting. The proposed test relies on neither the normality assumption nor the computationally expensive permutation method. Furthermore, with the utilization of the forward searching algorithm complemented by the adaptive lasso method, the proposed method is also capable of detecting the position and duration of signal(s) robust to the noise accumulated from the high dimensional covariates. The superior performance of the proposed method is illustrated by comparison with other existing methods both in numerical simulation and a real dataset collected from a semi-conductor manufacturing process.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMathematical statistics-
dc.titleNonparametric and adaptive tests for high-dimensional data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2023-
dc.identifier.mmsid991044634608303414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats