File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Contributions to high-dimensional statistical analysis
Title | Contributions to high-dimensional statistical analysis |
---|---|
Authors | |
Issue Date | 2016 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Li, Z. [李兆媛]. (2016). Contributions to high-dimensional statistical analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | In this thesis, for several important high-dimensional problems where the dimension is large in comparison with the sample size, new methodologies are investigated with new limiting results and meaningful applications.
In the first problem, I generalise two simple but effective procedures, the determinant-based and trace-based criteria, to general populations for high-dimensional classification. Their asymptotic misclassification probabilities are derived using the theory of large dimensional random matrices. One of main results is that the misclassification probability cannot vanish even if the sample size become very large for some situations. The performance of these two criteria are explored for various structures of mean vector and covariance.
In the second problem, I study the question of testing independence between two large sets of variates. The main application here is to infer gene regulatory networks from gene expression data for normal and diseased populations, respectively. The networks are constructed by testing independence between pairs of genes and the test statistic is constructed from trace of a suitable large random matrix. Compared to traditional statistical methods, this new method successfully identifies important connections of genes in normal and diseased samples, respectively.
In the third problem, I develop new statistical theory for probabilistic principal component analysis models in high dimensions. An accurate estimator of the noise variance is proposed. By using random-matrix theory, the asymptotic normalities of this estimator are established for Gaussian and non-Gaussian cases, respectively. In addition, based on this new estimator of noise variance, I develop several important applications including constructing new criterion of determining the number of principal components and deriving new asymptotics for the related goodness-of-fit statistic.
In the last problem, I propose new tests to detect the existence of heteroscedasticity in high-dimensional linear regression. Using the theory of random Haar orthogonal matrices, the asymptotic normalities of statistics are obtained under the null and the assumption that the degree of freedom of model tends to infinity. These new tests are dimension-proof, which guarantees a wide applicability of them to different combinations of sample size and dimension. Extensive Monte-Carlo experiments and real data analyses demonstrate the superiority of our proposed tests over traditional methods in terms of size and power. |
Degree | Doctor of Philosophy |
Subject | Mathematical statistics |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/235899 |
HKU Library Item ID | b5801645 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Zhaoyuan | - |
dc.contributor.author | 李兆媛 | - |
dc.date.accessioned | 2016-11-09T23:26:59Z | - |
dc.date.available | 2016-11-09T23:26:59Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | Li, Z. [李兆媛]. (2016). Contributions to high-dimensional statistical analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/235899 | - |
dc.description.abstract | In this thesis, for several important high-dimensional problems where the dimension is large in comparison with the sample size, new methodologies are investigated with new limiting results and meaningful applications. In the first problem, I generalise two simple but effective procedures, the determinant-based and trace-based criteria, to general populations for high-dimensional classification. Their asymptotic misclassification probabilities are derived using the theory of large dimensional random matrices. One of main results is that the misclassification probability cannot vanish even if the sample size become very large for some situations. The performance of these two criteria are explored for various structures of mean vector and covariance. In the second problem, I study the question of testing independence between two large sets of variates. The main application here is to infer gene regulatory networks from gene expression data for normal and diseased populations, respectively. The networks are constructed by testing independence between pairs of genes and the test statistic is constructed from trace of a suitable large random matrix. Compared to traditional statistical methods, this new method successfully identifies important connections of genes in normal and diseased samples, respectively. In the third problem, I develop new statistical theory for probabilistic principal component analysis models in high dimensions. An accurate estimator of the noise variance is proposed. By using random-matrix theory, the asymptotic normalities of this estimator are established for Gaussian and non-Gaussian cases, respectively. In addition, based on this new estimator of noise variance, I develop several important applications including constructing new criterion of determining the number of principal components and deriving new asymptotics for the related goodness-of-fit statistic. In the last problem, I propose new tests to detect the existence of heteroscedasticity in high-dimensional linear regression. Using the theory of random Haar orthogonal matrices, the asymptotic normalities of statistics are obtained under the null and the assumption that the degree of freedom of model tends to infinity. These new tests are dimension-proof, which guarantees a wide applicability of them to different combinations of sample size and dimension. Extensive Monte-Carlo experiments and real data analyses demonstrate the superiority of our proposed tests over traditional methods in terms of size and power. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.subject.lcsh | Mathematical statistics | - |
dc.title | Contributions to high-dimensional statistical analysis | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5801645 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5801645 | - |
dc.identifier.mmsid | 991020813029703414 | - |