File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Study of survival models with infinite parameter space and its application in network analysis
Title | Study of survival models with infinite parameter space and its application in network analysis |
---|---|
Authors | |
Advisors | Advisor(s):Yuen, KC |
Issue Date | 2023 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhou, Y. [周云鵬]. (2023). Study of survival models with infinite parameter space and its application in network analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | High-dimensional data is commonly observed in survival analysis which requires the use of survival models with infinite parameter space. For example, genomic type of data such as DNA micro-array data is frequently used for risk prediction but the number of parameters p are always larger than the number of observations n. In dataset with a large number of covariates, the parameter space always exhibits sparsity or homogeneity. Therefore, it is crucial to developing methods for estimating the coefficients accurately and identifying the significant parameters affecting the hazard rate. Penalized regression like LASSO is a usual choice for variable selection in real applications. In order to improve the estimation accuracy, two algorithms are proposed to approximate the solution to the l0 penalized regression in this thesis. Both methods perform well in selecting the subset of parameters, especially in terms of controlling the false positive rate.
In addition, since the hazard rate of a survival model describes the frequency of event occurrence, it is natural to extend its application to the area of network analysis for describing the communication frequency between individuals. Recurrent network event data is most relevant for studying phenomena that involve repeated interactions between subjects over time, such as communication networks or social networks. The analysis of such data is hence more complex than that of static network data as one needs to analyze the
effects of network structure and temporal dynamics simultaneously. Here we propose new approaches that utilize two separate sets of parameters to account for degree heterogeneity and homophily, respectively. Meanwhile, the baseline
intensity function is left completely unspecified to flexibly capture the time-varying pattern of the underlying process. Under a semi-parametric model, we apply the fused smoothly clipped absolute deviation (SCAD) penalty to
group identification. To further incorporate more dynamic structures of the network, we then propose the fully non-parametric model based on the counting process with time varying parameters. Simulation studies are carried out to verify the consistency and asymptotic properties of the models of study and evaluate their finite-sample performance. Our models are also applied to different network datasets for illustration. |
Degree | Doctor of Philosophy |
Subject | Survival analysis (Biometry) |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/342895 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Yuen, KC | - |
dc.contributor.author | Zhou, Yunpeng | - |
dc.contributor.author | 周云鵬 | - |
dc.date.accessioned | 2024-05-07T01:22:15Z | - |
dc.date.available | 2024-05-07T01:22:15Z | - |
dc.date.issued | 2023 | - |
dc.identifier.citation | Zhou, Y. [周云鵬]. (2023). Study of survival models with infinite parameter space and its application in network analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/342895 | - |
dc.description.abstract | High-dimensional data is commonly observed in survival analysis which requires the use of survival models with infinite parameter space. For example, genomic type of data such as DNA micro-array data is frequently used for risk prediction but the number of parameters p are always larger than the number of observations n. In dataset with a large number of covariates, the parameter space always exhibits sparsity or homogeneity. Therefore, it is crucial to developing methods for estimating the coefficients accurately and identifying the significant parameters affecting the hazard rate. Penalized regression like LASSO is a usual choice for variable selection in real applications. In order to improve the estimation accuracy, two algorithms are proposed to approximate the solution to the l0 penalized regression in this thesis. Both methods perform well in selecting the subset of parameters, especially in terms of controlling the false positive rate. In addition, since the hazard rate of a survival model describes the frequency of event occurrence, it is natural to extend its application to the area of network analysis for describing the communication frequency between individuals. Recurrent network event data is most relevant for studying phenomena that involve repeated interactions between subjects over time, such as communication networks or social networks. The analysis of such data is hence more complex than that of static network data as one needs to analyze the effects of network structure and temporal dynamics simultaneously. Here we propose new approaches that utilize two separate sets of parameters to account for degree heterogeneity and homophily, respectively. Meanwhile, the baseline intensity function is left completely unspecified to flexibly capture the time-varying pattern of the underlying process. Under a semi-parametric model, we apply the fused smoothly clipped absolute deviation (SCAD) penalty to group identification. To further incorporate more dynamic structures of the network, we then propose the fully non-parametric model based on the counting process with time varying parameters. Simulation studies are carried out to verify the consistency and asymptotic properties of the models of study and evaluate their finite-sample performance. Our models are also applied to different network datasets for illustration. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Survival analysis (Biometry) | - |
dc.title | Study of survival models with infinite parameter space and its application in network analysis | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044791815603414 | - |