File Download
Supplementary

postgraduate thesis: Study of survival models with infinite parameter space and its application in network analysis

TitleStudy of survival models with infinite parameter space and its application in network analysis
Authors
Advisors
Advisor(s):Yuen, KC
Issue Date2023
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhou, Y. [周云鵬]. (2023). Study of survival models with infinite parameter space and its application in network analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractHigh-dimensional data is commonly observed in survival analysis which requires the use of survival models with infinite parameter space. For example, genomic type of data such as DNA micro-array data is frequently used for risk prediction but the number of parameters p are always larger than the number of observations n. In dataset with a large number of covariates, the parameter space always exhibits sparsity or homogeneity. Therefore, it is crucial to developing methods for estimating the coefficients accurately and identifying the significant parameters affecting the hazard rate. Penalized regression like LASSO is a usual choice for variable selection in real applications. In order to improve the estimation accuracy, two algorithms are proposed to approximate the solution to the l0 penalized regression in this thesis. Both methods perform well in selecting the subset of parameters, especially in terms of controlling the false positive rate. In addition, since the hazard rate of a survival model describes the frequency of event occurrence, it is natural to extend its application to the area of network analysis for describing the communication frequency between individuals. Recurrent network event data is most relevant for studying phenomena that involve repeated interactions between subjects over time, such as communication networks or social networks. The analysis of such data is hence more complex than that of static network data as one needs to analyze the effects of network structure and temporal dynamics simultaneously. Here we propose new approaches that utilize two separate sets of parameters to account for degree heterogeneity and homophily, respectively. Meanwhile, the baseline intensity function is left completely unspecified to flexibly capture the time-varying pattern of the underlying process. Under a semi-parametric model, we apply the fused smoothly clipped absolute deviation (SCAD) penalty to group identification. To further incorporate more dynamic structures of the network, we then propose the fully non-parametric model based on the counting process with time varying parameters. Simulation studies are carried out to verify the consistency and asymptotic properties of the models of study and evaluate their finite-sample performance. Our models are also applied to different network datasets for illustration.
DegreeDoctor of Philosophy
SubjectSurvival analysis (Biometry)
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/342895

 

DC FieldValueLanguage
dc.contributor.advisorYuen, KC-
dc.contributor.authorZhou, Yunpeng-
dc.contributor.author周云鵬-
dc.date.accessioned2024-05-07T01:22:15Z-
dc.date.available2024-05-07T01:22:15Z-
dc.date.issued2023-
dc.identifier.citationZhou, Y. [周云鵬]. (2023). Study of survival models with infinite parameter space and its application in network analysis. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/342895-
dc.description.abstractHigh-dimensional data is commonly observed in survival analysis which requires the use of survival models with infinite parameter space. For example, genomic type of data such as DNA micro-array data is frequently used for risk prediction but the number of parameters p are always larger than the number of observations n. In dataset with a large number of covariates, the parameter space always exhibits sparsity or homogeneity. Therefore, it is crucial to developing methods for estimating the coefficients accurately and identifying the significant parameters affecting the hazard rate. Penalized regression like LASSO is a usual choice for variable selection in real applications. In order to improve the estimation accuracy, two algorithms are proposed to approximate the solution to the l0 penalized regression in this thesis. Both methods perform well in selecting the subset of parameters, especially in terms of controlling the false positive rate. In addition, since the hazard rate of a survival model describes the frequency of event occurrence, it is natural to extend its application to the area of network analysis for describing the communication frequency between individuals. Recurrent network event data is most relevant for studying phenomena that involve repeated interactions between subjects over time, such as communication networks or social networks. The analysis of such data is hence more complex than that of static network data as one needs to analyze the effects of network structure and temporal dynamics simultaneously. Here we propose new approaches that utilize two separate sets of parameters to account for degree heterogeneity and homophily, respectively. Meanwhile, the baseline intensity function is left completely unspecified to flexibly capture the time-varying pattern of the underlying process. Under a semi-parametric model, we apply the fused smoothly clipped absolute deviation (SCAD) penalty to group identification. To further incorporate more dynamic structures of the network, we then propose the fully non-parametric model based on the counting process with time varying parameters. Simulation studies are carried out to verify the consistency and asymptotic properties of the models of study and evaluate their finite-sample performance. Our models are also applied to different network datasets for illustration.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshSurvival analysis (Biometry)-
dc.titleStudy of survival models with infinite parameter space and its application in network analysis-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2024-
dc.identifier.mmsid991044791815603414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats