Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Jiang, Lingjing; Haiminen, Niina; Carrieri, Anna Paola; Huang, Shi; Vázquez-Baeza, Yoshiki; Parida, Laxmi; Kim, Ho Cheol; Swafford, Austin D.; Knight, Rob; Natarajan, Loki

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1111/biom.13481
Scopus: eid_2-s2.0-85105985827
PMID: 33914902
WOS: WOS:000651899000001
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Faculty of Dentistry: Journal/Magazine Articles

Article: Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Title	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
Authors	Jiang, Lingjing Haiminen, Niina Carrieri, Anna Paola Huang, Shi Vázquez-Baeza, Yoshiki Parida, Laxmi Kim, Ho Cheol Swafford, Austin D.Knight, Rob Natarajan, Loki
Keywords	classification feature selection microbiome prediction reproducible stability
Issue Date	2021
Citation	Biometrics, 2021 How to Cite? DOI: http://dx.doi.org/10.1111/biom.13481
Abstract	Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.
Persistent Identifier	http://hdl.handle.net/10722/311515
ISSN	0006-341X 2023 Impact Factor: 1.4 2023 SCImago Journal Rankings: 1.480
ISI Accession Number ID	WOS:000651899000001

DC Field	Value	Language
dc.contributor.author	Jiang, Lingjing	-
dc.contributor.author	Haiminen, Niina	-
dc.contributor.author	Carrieri, Anna Paola	-
dc.contributor.author	Huang, Shi	-
dc.contributor.author	Vázquez-Baeza, Yoshiki	-
dc.contributor.author	Parida, Laxmi	-
dc.contributor.author	Kim, Ho Cheol	-
dc.contributor.author	Swafford, Austin D.	-
dc.contributor.author	Knight, Rob	-
dc.contributor.author	Natarajan, Loki	-
dc.date.accessioned	2022-03-22T11:54:07Z	-
dc.date.available	2022-03-22T11:54:07Z	-
dc.date.issued	2021	-
dc.identifier.citation	Biometrics, 2021	-
dc.identifier.issn	0006-341X	-
dc.identifier.uri	http://hdl.handle.net/10722/311515	-
dc.description.abstract	Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.	-
dc.language	eng	-
dc.relation.ispartof	Biometrics	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	classification	-
dc.subject	feature selection	-
dc.subject	microbiome	-
dc.subject	prediction	-
dc.subject	reproducible	-
dc.subject	stability	-
dc.title	Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data	-
dc.type	Article	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1111/biom.13481	-
dc.identifier.pmid	33914902	-
dc.identifier.scopus	eid_2-s2.0-85105985827	-
dc.identifier.eissn	1541-0420	-
dc.identifier.isi	WOS:000651899000001	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats