File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Proportional and compositional data modeling with possible zero observations
Title | Proportional and compositional data modeling with possible zero observations |
---|---|
Authors | |
Issue Date | 2021 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Liu, P. [刘鹏懿]. (2021). Proportional and compositional data modeling with possible zero observations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | Statistical modeling of proportional and compositional data are studied in this thesis which is structured into three parts.
The first part of the thesis introduces the so-called zero-one-inflated simplex (ZOIS) distribution which can be viewed as a mixture of the Bernoulli distribution and the simplex distribution. This new distribution is useful for modeling continuous data belonging to the closed unit interval [0, 1]. A new minorization-maximization (MM) algorithm is established to calculate the maximum likelihood estimates (MLEs) of the parameters of the distribution, and presents some likelihood-based inference methods for the ZOIS regression model. A real application of the proposed methods is then carried out for illustration purposes, and a comparison between the ZOIS model and the zero-one-inflated beta (ZOIB) model is also made.
The second part of the thesis is to develop the so-called proportional inverse Gaussian (PIG) distribution for analyzing continuous proportional data in (0, 1). In view of the complexity of an integral in the PIG density function, a novel MM algorithm is constructed by means of the continuous version of Jensen’s inequality for calculating the MLEs of the parameters in the PIG distribution. Also, by making use of the gradient descent algorithm, an MM
algorithm is developed for computing the MLEs of the parameters in the PIG regression model. This MM algorithm allows us to explore the relationship between a set of covariates and the mean parameter. Simulation studies are
then conducted to assess the performance of the proposed methods. Based on the hospital stay data of Barcelona in 1988 and 1990, some real data analyses are also carried out. Empirical evidence shows that the PIG distribution is better than the beta and simplex distributions in terms of the Akaike information criterion (AIC), the Cramér–von Mises test and the Kolmogorov–Smirnov test.
In the third part of the thesis, a new distribution, namely the compositional inverse Gaussian (CIG) distribution, is proposed for analyzing compositional data (CoDa). Based on the stochastic representation (SR) rather than the CIG density with an intractable integral, an expectation-maximization (EM) algorithm is adopted to estimate the parameters of the distribution. Furthermore, for the estimation of the CIG regression model, another EM algorithm is derived using the one-step gradient descent algorithm. Real data analyses show that the CIG distribution and regression model outperform other existing
methods in terms of the AIC. Since zeros are often observed when dealing with CoDa, a new model, namely the compositional inverse Gaussian random vector with zero components (ZCIG) model, is proposed by using a novel mixture SR based on both the CIG random vector and the so-called zero-truncated product Bernoulli random vector for modeling CoDa with zeros. Parameter estimation of the ZCIG model is also examined. Two real data sets are analyzed using the proposed statistical methods, and a comparison between the proposed models and two existing models (Dirichlet and logistic-normal) is also presented in the thesis. |
Degree | Doctor of Philosophy |
Subject | Correlation (Statistics) Multivariate analysis |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/308608 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Liu, Pengyi | - |
dc.contributor.author | 刘鹏懿 | - |
dc.date.accessioned | 2021-12-06T01:03:59Z | - |
dc.date.available | 2021-12-06T01:03:59Z | - |
dc.date.issued | 2021 | - |
dc.identifier.citation | Liu, P. [刘鹏懿]. (2021). Proportional and compositional data modeling with possible zero observations. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/308608 | - |
dc.description.abstract | Statistical modeling of proportional and compositional data are studied in this thesis which is structured into three parts. The first part of the thesis introduces the so-called zero-one-inflated simplex (ZOIS) distribution which can be viewed as a mixture of the Bernoulli distribution and the simplex distribution. This new distribution is useful for modeling continuous data belonging to the closed unit interval [0, 1]. A new minorization-maximization (MM) algorithm is established to calculate the maximum likelihood estimates (MLEs) of the parameters of the distribution, and presents some likelihood-based inference methods for the ZOIS regression model. A real application of the proposed methods is then carried out for illustration purposes, and a comparison between the ZOIS model and the zero-one-inflated beta (ZOIB) model is also made. The second part of the thesis is to develop the so-called proportional inverse Gaussian (PIG) distribution for analyzing continuous proportional data in (0, 1). In view of the complexity of an integral in the PIG density function, a novel MM algorithm is constructed by means of the continuous version of Jensen’s inequality for calculating the MLEs of the parameters in the PIG distribution. Also, by making use of the gradient descent algorithm, an MM algorithm is developed for computing the MLEs of the parameters in the PIG regression model. This MM algorithm allows us to explore the relationship between a set of covariates and the mean parameter. Simulation studies are then conducted to assess the performance of the proposed methods. Based on the hospital stay data of Barcelona in 1988 and 1990, some real data analyses are also carried out. Empirical evidence shows that the PIG distribution is better than the beta and simplex distributions in terms of the Akaike information criterion (AIC), the Cramér–von Mises test and the Kolmogorov–Smirnov test. In the third part of the thesis, a new distribution, namely the compositional inverse Gaussian (CIG) distribution, is proposed for analyzing compositional data (CoDa). Based on the stochastic representation (SR) rather than the CIG density with an intractable integral, an expectation-maximization (EM) algorithm is adopted to estimate the parameters of the distribution. Furthermore, for the estimation of the CIG regression model, another EM algorithm is derived using the one-step gradient descent algorithm. Real data analyses show that the CIG distribution and regression model outperform other existing methods in terms of the AIC. Since zeros are often observed when dealing with CoDa, a new model, namely the compositional inverse Gaussian random vector with zero components (ZCIG) model, is proposed by using a novel mixture SR based on both the CIG random vector and the so-called zero-truncated product Bernoulli random vector for modeling CoDa with zeros. Parameter estimation of the ZCIG model is also examined. Two real data sets are analyzed using the proposed statistical methods, and a comparison between the proposed models and two existing models (Dirichlet and logistic-normal) is also presented in the thesis. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Correlation (Statistics) | - |
dc.subject.lcsh | Multivariate analysis | - |
dc.title | Proportional and compositional data modeling with possible zero observations | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2021 | - |
dc.date.hkucongregation | 2021 | - |
dc.identifier.mmsid | 991044448911803414 | - |