File Download
Supplementary

postgraduate thesis: Proportional and compositional data modeling with possible zero observations

TitleProportional and compositional data modeling with possible zero observations
Authors
Issue Date2021
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Liu, P. [刘鹏懿]. (2021). Proportional and compositional data modeling with possible zero observations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractStatistical modeling of proportional and compositional data are studied in this thesis which is structured into three parts. The first part of the thesis introduces the so-called zero-one-inflated simplex (ZOIS) distribution which can be viewed as a mixture of the Bernoulli distribution and the simplex distribution. This new distribution is useful for modeling continuous data belonging to the closed unit interval [0, 1]. A new minorization-maximization (MM) algorithm is established to calculate the maximum likelihood estimates (MLEs) of the parameters of the distribution, and presents some likelihood-based inference methods for the ZOIS regression model. A real application of the proposed methods is then carried out for illustration purposes, and a comparison between the ZOIS model and the zero-one-inflated beta (ZOIB) model is also made. The second part of the thesis is to develop the so-called proportional inverse Gaussian (PIG) distribution for analyzing continuous proportional data in (0, 1). In view of the complexity of an integral in the PIG density function, a novel MM algorithm is constructed by means of the continuous version of Jensen’s inequality for calculating the MLEs of the parameters in the PIG distribution. Also, by making use of the gradient descent algorithm, an MM algorithm is developed for computing the MLEs of the parameters in the PIG regression model. This MM algorithm allows us to explore the relationship between a set of covariates and the mean parameter. Simulation studies are then conducted to assess the performance of the proposed methods. Based on the hospital stay data of Barcelona in 1988 and 1990, some real data analyses are also carried out. Empirical evidence shows that the PIG distribution is better than the beta and simplex distributions in terms of the Akaike information criterion (AIC), the Cramér–von Mises test and the Kolmogorov–Smirnov test. In the third part of the thesis, a new distribution, namely the compositional inverse Gaussian (CIG) distribution, is proposed for analyzing compositional data (CoDa). Based on the stochastic representation (SR) rather than the CIG density with an intractable integral, an expectation-maximization (EM) algorithm is adopted to estimate the parameters of the distribution. Furthermore, for the estimation of the CIG regression model, another EM algorithm is derived using the one-step gradient descent algorithm. Real data analyses show that the CIG distribution and regression model outperform other existing methods in terms of the AIC. Since zeros are often observed when dealing with CoDa, a new model, namely the compositional inverse Gaussian random vector with zero components (ZCIG) model, is proposed by using a novel mixture SR based on both the CIG random vector and the so-called zero-truncated product Bernoulli random vector for modeling CoDa with zeros. Parameter estimation of the ZCIG model is also examined. Two real data sets are analyzed using the proposed statistical methods, and a comparison between the proposed models and two existing models (Dirichlet and logistic-normal) is also presented in the thesis.
DegreeDoctor of Philosophy
SubjectCorrelation (Statistics)
Multivariate analysis
Dept/ProgramStatistics and Actuarial Science
Persistent Identifierhttp://hdl.handle.net/10722/308608

 

DC FieldValueLanguage
dc.contributor.authorLiu, Pengyi-
dc.contributor.author刘鹏懿-
dc.date.accessioned2021-12-06T01:03:59Z-
dc.date.available2021-12-06T01:03:59Z-
dc.date.issued2021-
dc.identifier.citationLiu, P. [刘鹏懿]. (2021). Proportional and compositional data modeling with possible zero observations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/308608-
dc.description.abstractStatistical modeling of proportional and compositional data are studied in this thesis which is structured into three parts. The first part of the thesis introduces the so-called zero-one-inflated simplex (ZOIS) distribution which can be viewed as a mixture of the Bernoulli distribution and the simplex distribution. This new distribution is useful for modeling continuous data belonging to the closed unit interval [0, 1]. A new minorization-maximization (MM) algorithm is established to calculate the maximum likelihood estimates (MLEs) of the parameters of the distribution, and presents some likelihood-based inference methods for the ZOIS regression model. A real application of the proposed methods is then carried out for illustration purposes, and a comparison between the ZOIS model and the zero-one-inflated beta (ZOIB) model is also made. The second part of the thesis is to develop the so-called proportional inverse Gaussian (PIG) distribution for analyzing continuous proportional data in (0, 1). In view of the complexity of an integral in the PIG density function, a novel MM algorithm is constructed by means of the continuous version of Jensen’s inequality for calculating the MLEs of the parameters in the PIG distribution. Also, by making use of the gradient descent algorithm, an MM algorithm is developed for computing the MLEs of the parameters in the PIG regression model. This MM algorithm allows us to explore the relationship between a set of covariates and the mean parameter. Simulation studies are then conducted to assess the performance of the proposed methods. Based on the hospital stay data of Barcelona in 1988 and 1990, some real data analyses are also carried out. Empirical evidence shows that the PIG distribution is better than the beta and simplex distributions in terms of the Akaike information criterion (AIC), the Cramér–von Mises test and the Kolmogorov–Smirnov test. In the third part of the thesis, a new distribution, namely the compositional inverse Gaussian (CIG) distribution, is proposed for analyzing compositional data (CoDa). Based on the stochastic representation (SR) rather than the CIG density with an intractable integral, an expectation-maximization (EM) algorithm is adopted to estimate the parameters of the distribution. Furthermore, for the estimation of the CIG regression model, another EM algorithm is derived using the one-step gradient descent algorithm. Real data analyses show that the CIG distribution and regression model outperform other existing methods in terms of the AIC. Since zeros are often observed when dealing with CoDa, a new model, namely the compositional inverse Gaussian random vector with zero components (ZCIG) model, is proposed by using a novel mixture SR based on both the CIG random vector and the so-called zero-truncated product Bernoulli random vector for modeling CoDa with zeros. Parameter estimation of the ZCIG model is also examined. Two real data sets are analyzed using the proposed statistical methods, and a comparison between the proposed models and two existing models (Dirichlet and logistic-normal) is also presented in the thesis.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshCorrelation (Statistics)-
dc.subject.lcshMultivariate analysis-
dc.titleProportional and compositional data modeling with possible zero observations-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineStatistics and Actuarial Science-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2021-
dc.date.hkucongregation2021-
dc.identifier.mmsid991044448911803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats