File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Incomplete categorical data, inflated count data analyses and robust modeling with applications
Title | Incomplete categorical data, inflated count data analyses and robust modeling with applications |
---|---|
Authors | |
Advisors | |
Issue Date | 2017 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhang, C. [张弛]. (2017). Incomplete categorical data, inflated count data analyses and robust modeling with applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | In this thesis, some issues related with incomplete categorical data and inflated count data analyses as well as a robust statistical model are considered.
The first part investigates the problem of a case-control study with missing data. Specifically, the valid sampling distribution of the observed counts under the assumption of missing at random is derived, and the corresponding statistical inference methods are developed. The theoretical comparisons of the proposed sampling distribution with two existing methods exhibit a large difference. The results elucidate that the conclusion by the Wald test under different sampling distributions may be completely diverse and even contradictory.
The second part studies some distributional properties of the zero-and-one inflated Poisson (ZOIP) distribution which was proposed by Melkersson and Olsson (1999) to model count data with large amounts of zero and one observations. Stochastic representations are constructed for the ZOIP random variable. These representations facilitate the expectation-maximization algorithm to obtain the maximum likelihood estimates for the parameters of interest. Other likelihood-based inference results including the bootstrap confidence intervals and testing hypotheses under large sample sizes are also provided.
The third part generalizes the univariate ZOIP distribution to the multivariate case. The multivariate ZOIP distribution can be used to handle the multivariate count data with inflated counts for both zero and one. It possesses a very general correlation structure that depends on the values of the parameters, allowing a positive or negative correlation coefficient between any pair of random components. For the proposed multivariate distribution, important distributional properties are derived, and some useful statistical inference methods are developed.
The final part proposes a new multivariate t (MVT) distribution by allowing different degrees of freedom for each univariate component. It includes components following the multivariate normal distribution when the corresponding degrees of freedom tend to infinity. It also contains the product of independent t distributions as a special case. Unlike the classical MVT distribution, this new structure is more flexible in model specification.
The performances of all the proposed methods in this thesis are evaluated through simulation studies and real data analyses. |
Degree | Doctor of Philosophy |
Subject | Sampling (Statistics) Poisson distribution Multivariate analysis |
Dept/Program | Statistics and Actuarial Science |
Persistent Identifier | http://hdl.handle.net/10722/250805 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Yuen, KC | - |
dc.contributor.advisor | Tian, G | - |
dc.contributor.author | Zhang, Chi | - |
dc.contributor.author | 张弛 | - |
dc.date.accessioned | 2018-01-26T01:59:35Z | - |
dc.date.available | 2018-01-26T01:59:35Z | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Zhang, C. [张弛]. (2017). Incomplete categorical data, inflated count data analyses and robust modeling with applications. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/250805 | - |
dc.description.abstract | In this thesis, some issues related with incomplete categorical data and inflated count data analyses as well as a robust statistical model are considered. The first part investigates the problem of a case-control study with missing data. Specifically, the valid sampling distribution of the observed counts under the assumption of missing at random is derived, and the corresponding statistical inference methods are developed. The theoretical comparisons of the proposed sampling distribution with two existing methods exhibit a large difference. The results elucidate that the conclusion by the Wald test under different sampling distributions may be completely diverse and even contradictory. The second part studies some distributional properties of the zero-and-one inflated Poisson (ZOIP) distribution which was proposed by Melkersson and Olsson (1999) to model count data with large amounts of zero and one observations. Stochastic representations are constructed for the ZOIP random variable. These representations facilitate the expectation-maximization algorithm to obtain the maximum likelihood estimates for the parameters of interest. Other likelihood-based inference results including the bootstrap confidence intervals and testing hypotheses under large sample sizes are also provided. The third part generalizes the univariate ZOIP distribution to the multivariate case. The multivariate ZOIP distribution can be used to handle the multivariate count data with inflated counts for both zero and one. It possesses a very general correlation structure that depends on the values of the parameters, allowing a positive or negative correlation coefficient between any pair of random components. For the proposed multivariate distribution, important distributional properties are derived, and some useful statistical inference methods are developed. The final part proposes a new multivariate t (MVT) distribution by allowing different degrees of freedom for each univariate component. It includes components following the multivariate normal distribution when the corresponding degrees of freedom tend to infinity. It also contains the product of independent t distributions as a special case. Unlike the classical MVT distribution, this new structure is more flexible in model specification. The performances of all the proposed methods in this thesis are evaluated through simulation studies and real data analyses. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Sampling (Statistics) | - |
dc.subject.lcsh | Poisson distribution | - |
dc.subject.lcsh | Multivariate analysis | - |
dc.title | Incomplete categorical data, inflated count data analyses and robust modeling with applications | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Statistics and Actuarial Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991043979552803414 | - |
dc.date.hkucongregation | 2017 | - |
dc.identifier.mmsid | 991043979552803414 | - |