Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion

Zhao, Jianhua; Shang, Changchun; Li, Shulan; Xin, Ling; Yu, Philip L H

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s11634-024-00582-w
Scopus: eid_2-s2.0-85186885798
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Computer Science: Journal/Magazine Articles

Article: Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion

Title	Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion
Authors	Zhao, Jianhua Shang, Changchun Li, Shulan Xin, Ling Yu, Philip L H
Keywords	62D10 62F07 62F99 62H25 BIC Factor analysis Incomplete data Maximum likelihood Model selection Variational Bayesian
Issue Date	7-Mar-2024
Publisher	Springer
Citation	Advances in Data Analysis and Classification, 2024 How to Cite? DOI: http://dx.doi.org/10.1007/s11634-024-00582-w
Abstract	The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size N, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the ‘complete’ sample size N is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only ��<� observations for variable i, which means that using the ‘complete’ sample size N implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel hierarchical BIC (HBIC) criterion is proposed for factor analysis with incomplete data, which is denoted by HBIC_inc. The novelty is that HBIC_inc only uses the actual amounts of observed information, namely ��’s, in the penalty term. Theoretically, it is shown that HBIC_inc is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC_inc, which means that HBIC_inc shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC_inc, BIC, and related criteria with various missing rates. The results show that HBIC_inc and BIC perform similarly when the missing rate is small, but HBIC_inc is more accurate when the missing rate is not small.
Persistent Identifier	http://hdl.handle.net/10722/351235
ISSN	1862-5347 2023 Impact Factor: 1.4 2023 SCImago Journal Rankings: 0.594

DC Field	Value	Language
dc.contributor.author	Zhao, Jianhua	-
dc.contributor.author	Shang, Changchun	-
dc.contributor.author	Li, Shulan	-
dc.contributor.author	Xin, Ling	-
dc.contributor.author	Yu, Philip L H	-
dc.date.accessioned	2024-11-15T00:39:53Z	-
dc.date.available	2024-11-15T00:39:53Z	-
dc.date.issued	2024-03-07	-
dc.identifier.citation	Advances in Data Analysis and Classification, 2024	-
dc.identifier.issn	1862-5347	-
dc.identifier.uri	http://hdl.handle.net/10722/351235	-
dc.description.abstract	<p>The Bayesian information criterion (BIC), defined as the observed data log likelihood minus a penalty term based on the sample size <em>N</em>, is a popular model selection criterion for factor analysis with complete data. This definition has also been suggested for incomplete data. However, the penalty term based on the ‘complete’ sample size <em>N</em> is the same no matter whether in a complete or incomplete data case. For incomplete data, there are often only ��<� observations for variable <em>i</em>, which means that using the ‘complete’ sample size <em>N</em> implausibly ignores the amounts of missing information inherent in incomplete data. Given this observation, a novel hierarchical BIC (HBIC) criterion is proposed for factor analysis with incomplete data, which is denoted by HBIC<sub>inc</sub>. The novelty is that HBIC<sub>inc</sub> only uses the actual amounts of observed information, namely ��’s, in the penalty term. Theoretically, it is shown that HBIC<sub>inc</sub> is a large sample approximation of variational Bayesian (VB) lower bound, and BIC is a further approximation of HBIC<sub>inc</sub>, which means that HBIC<sub>inc</sub> shares the theoretical consistency of BIC. Experiments on synthetic and real data sets are conducted to access the finite sample performance of HBIC<sub>inc</sub>, BIC, and related criteria with various missing rates. The results show that HBIC<sub>inc</sub> and BIC perform similarly when the missing rate is small, but HBIC<sub>inc</sub> is more accurate when the missing rate is not small.</p>	-
dc.language	eng	-
dc.publisher	Springer	-
dc.relation.ispartof	Advances in Data Analysis and Classification	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	62D10	-
dc.subject	62F07	-
dc.subject	62F99	-
dc.subject	62H25	-
dc.subject	BIC	-
dc.subject	Factor analysis	-
dc.subject	Incomplete data	-
dc.subject	Maximum likelihood	-
dc.subject	Model selection	-
dc.subject	Variational Bayesian	-
dc.title	Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion	-
dc.type	Article	-
dc.identifier.doi	10.1007/s11634-024-00582-w	-
dc.identifier.scopus	eid_2-s2.0-85186885798	-
dc.identifier.eissn	1862-5355	-
dc.identifier.issnl	1862-5355	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Choosing the number of factors in factor analysis with incomplete data via a novel hierarchical Bayesian information criterion

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats