File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines

TitleStatistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines
Authors
KeywordsDSML 1: Concept: Basic principles of a new data science output observed and reported
Issue Date2020
Citation
Patterns, 2020, v. 1, n. 7, article no. 100115 How to Cite?
AbstractMaking binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example. In data science education, two analysis strategies, hypothesis testing and binary classification, are mostly covered in different courses and textbooks. In real data application, it can be puzzling whether a binary decision problem should be formulated as hypothesis testing or binary classification. This article aims to disentangle the puzzle for data science students and researchers by offering practical guidelines for choosing between the two strategies. Hypothesis testing and binary classification are two data analysis strategies taught mostly in different undergraduate classes and rarely compared with each other. As a result, which strategy is more appropriate for a specific real-world data analysis task is often ambiguous. To address this issue, this perspective article clarifies the distinctions between the two strategies and offers practical guidelines to the broad data science discipline.
Persistent Identifierhttp://hdl.handle.net/10722/354181

 

DC FieldValueLanguage
dc.contributor.authorLi, Jingyi Jessica-
dc.contributor.authorTong, Xin-
dc.date.accessioned2025-02-07T08:47:00Z-
dc.date.available2025-02-07T08:47:00Z-
dc.date.issued2020-
dc.identifier.citationPatterns, 2020, v. 1, n. 7, article no. 100115-
dc.identifier.urihttp://hdl.handle.net/10722/354181-
dc.description.abstractMaking binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example. In data science education, two analysis strategies, hypothesis testing and binary classification, are mostly covered in different courses and textbooks. In real data application, it can be puzzling whether a binary decision problem should be formulated as hypothesis testing or binary classification. This article aims to disentangle the puzzle for data science students and researchers by offering practical guidelines for choosing between the two strategies. Hypothesis testing and binary classification are two data analysis strategies taught mostly in different undergraduate classes and rarely compared with each other. As a result, which strategy is more appropriate for a specific real-world data analysis task is often ambiguous. To address this issue, this perspective article clarifies the distinctions between the two strategies and offers practical guidelines to the broad data science discipline.-
dc.languageeng-
dc.relation.ispartofPatterns-
dc.subjectDSML 1: Concept: Basic principles of a new data science output observed and reported-
dc.titleStatistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/j.patter.2020.100115-
dc.identifier.scopuseid_2-s2.0-85102966494-
dc.identifier.volume1-
dc.identifier.issue7-
dc.identifier.spagearticle no. 100115-
dc.identifier.epagearticle no. 100115-
dc.identifier.eissn2666-3899-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats