File Download
Supplementary

postgraduate thesis: Spike detection : random matrix theory applications on high-dimensional data

TitleSpike detection : random matrix theory applications on high-dimensional data
Authors
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Xu, Y. [徐毓阳]. (2022). Spike detection : random matrix theory applications on high-dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractSpike detection using Random Matrix Theory (RMT) on high-dimensional data are investigated in this thesis. In the first part, we study a “mysterious” phase transition phenomenon raised by Nakatsukasa et al. (2013) in the spectra of the graph Laplacian matrices of dendrite graphs from biological experiments on mouse's retinal ganglion cells. While the bulk of the spectrum can be well understood by structures resembling starlike trees, mysteries about the spikes, that is, isolated eigenvalues outside the bulk spectrum, remain unexplained. We bring new insights to these mysteries by considering a class of uniform trees. Exact relationships between the number of such spikes and the number of T-junctions are analyzed in function of the number of vertices separating the T-junctions. Using these theoretical results, predictions are proposed for the number of spikes observed in real-life dendrite graphs. Interestingly enough, these predictions match well the observed numbers of spikes, thus confirm the practical value of our theoretical results. In the second part, we introduce a method called ERStruct to estimate the number of top informative PCs in whole genome sequencing data accounting for complicated LD structure between genetic markers. There are two important issues regarding the traditional method by Patterson, Price, and Reich (2006). First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio n/p is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative principal components based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method.
DegreeDoctor of Philosophy
SubjectRandom matrices
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/325819

 

DC FieldValueLanguage
dc.contributor.authorXu, Yuyang-
dc.contributor.author徐毓阳-
dc.date.accessioned2023-03-02T16:33:05Z-
dc.date.available2023-03-02T16:33:05Z-
dc.date.issued2022-
dc.identifier.citationXu, Y. [徐毓阳]. (2022). Spike detection : random matrix theory applications on high-dimensional data. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/325819-
dc.description.abstractSpike detection using Random Matrix Theory (RMT) on high-dimensional data are investigated in this thesis. In the first part, we study a “mysterious” phase transition phenomenon raised by Nakatsukasa et al. (2013) in the spectra of the graph Laplacian matrices of dendrite graphs from biological experiments on mouse's retinal ganglion cells. While the bulk of the spectrum can be well understood by structures resembling starlike trees, mysteries about the spikes, that is, isolated eigenvalues outside the bulk spectrum, remain unexplained. We bring new insights to these mysteries by considering a class of uniform trees. Exact relationships between the number of such spikes and the number of T-junctions are analyzed in function of the number of vertices separating the T-junctions. Using these theoretical results, predictions are proposed for the number of spikes observed in real-life dendrite graphs. Interestingly enough, these predictions match well the observed numbers of spikes, thus confirm the practical value of our theoretical results. In the second part, we introduce a method called ERStruct to estimate the number of top informative PCs in whole genome sequencing data accounting for complicated LD structure between genetic markers. There are two important issues regarding the traditional method by Patterson, Price, and Reich (2006). First, the number of genetic variants p is much larger than the sample size n in sequencing data such that the sample-to-marker ratio n/p is nearly zero, violating the assumption of the Tracy-Widom test used in their method. Second, their method might not be able to handle the linkage disequilibrium well in sequencing data. To resolve those two practical issues, we propose a new method called ERStruct to determine the number of top informative principal components based on sequencing data. More specifically, we propose to use the ratio of consecutive eigenvalues as a more robust test statistic, and then we approximate its null distribution using modern random matrix theory. Both simulation studies and applications to two public data sets from the HapMap 3 and the 1000 Genomes Projects demonstrate the empirical performance of our ERStruct method.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshRandom matrices-
dc.titleSpike detection : random matrix theory applications on high-dimensional data-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044649902603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats