File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data

TitleUtilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
Authors
Keywordsclassification
feature selection
microbiome
prediction
reproducible
stability
Issue Date2021
Citation
Biometrics, 2021 How to Cite?
AbstractFeature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.
Persistent Identifierhttp://hdl.handle.net/10722/311515
ISSN
2021 Impact Factor: 1.701
2020 SCImago Journal Rankings: 2.298
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorJiang, Lingjing-
dc.contributor.authorHaiminen, Niina-
dc.contributor.authorCarrieri, Anna Paola-
dc.contributor.authorHuang, Shi-
dc.contributor.authorVázquez-Baeza, Yoshiki-
dc.contributor.authorParida, Laxmi-
dc.contributor.authorKim, Ho Cheol-
dc.contributor.authorSwafford, Austin D.-
dc.contributor.authorKnight, Rob-
dc.contributor.authorNatarajan, Loki-
dc.date.accessioned2022-03-22T11:54:07Z-
dc.date.available2022-03-22T11:54:07Z-
dc.date.issued2021-
dc.identifier.citationBiometrics, 2021-
dc.identifier.issn0006-341X-
dc.identifier.urihttp://hdl.handle.net/10722/311515-
dc.description.abstractFeature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the data would lead to large changes in the chosen feature subset, then many selected features are likely to be a data artifact rather than real biological signal. This crucial need of identifying relevant and reproducible features motivated the reproducibility evaluation criterion such as Stability, which quantifies how robust a method is to perturbations in the data. In our paper, we compare the performance of popular model prediction metrics (MSE or AUC) with proposed reproducibility criterion Stability in evaluating four widely used feature selection methods in both simulations and experimental microbiome applications with continuous or binary outcomes. We conclude that Stability is a preferred feature selection criterion over model prediction metrics because it better quantifies the reproducibility of the feature selection method.-
dc.languageeng-
dc.relation.ispartofBiometrics-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectclassification-
dc.subjectfeature selection-
dc.subjectmicrobiome-
dc.subjectprediction-
dc.subjectreproducible-
dc.subjectstability-
dc.titleUtilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data-
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1111/biom.13481-
dc.identifier.pmid33914902-
dc.identifier.scopuseid_2-s2.0-85105985827-
dc.identifier.eissn1541-0420-
dc.identifier.isiWOS:000651899000001-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats