File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/j.patter.2020.100115
- Scopus: eid_2-s2.0-85102966494
Supplementary
-
Citations:
- Scopus: 33
- Appears in Collections:
Article: Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines
Title | Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines |
---|---|
Authors | |
Keywords | DSML 1: Concept: Basic principles of a new data science output observed and reported |
Issue Date | 2020 |
Citation | Patterns, 2020, v. 1, n. 7, article no. 100115 How to Cite? |
Abstract | Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example. In data science education, two analysis strategies, hypothesis testing and binary classification, are mostly covered in different courses and textbooks. In real data application, it can be puzzling whether a binary decision problem should be formulated as hypothesis testing or binary classification. This article aims to disentangle the puzzle for data science students and researchers by offering practical guidelines for choosing between the two strategies. Hypothesis testing and binary classification are two data analysis strategies taught mostly in different undergraduate classes and rarely compared with each other. As a result, which strategy is more appropriate for a specific real-world data analysis task is often ambiguous. To address this issue, this perspective article clarifies the distinctions between the two strategies and offers practical guidelines to the broad data science discipline. |
Persistent Identifier | http://hdl.handle.net/10722/354181 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Li, Jingyi Jessica | - |
dc.contributor.author | Tong, Xin | - |
dc.date.accessioned | 2025-02-07T08:47:00Z | - |
dc.date.available | 2025-02-07T08:47:00Z | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | Patterns, 2020, v. 1, n. 7, article no. 100115 | - |
dc.identifier.uri | http://hdl.handle.net/10722/354181 | - |
dc.description.abstract | Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example. In data science education, two analysis strategies, hypothesis testing and binary classification, are mostly covered in different courses and textbooks. In real data application, it can be puzzling whether a binary decision problem should be formulated as hypothesis testing or binary classification. This article aims to disentangle the puzzle for data science students and researchers by offering practical guidelines for choosing between the two strategies. Hypothesis testing and binary classification are two data analysis strategies taught mostly in different undergraduate classes and rarely compared with each other. As a result, which strategy is more appropriate for a specific real-world data analysis task is often ambiguous. To address this issue, this perspective article clarifies the distinctions between the two strategies and offers practical guidelines to the broad data science discipline. | - |
dc.language | eng | - |
dc.relation.ispartof | Patterns | - |
dc.subject | DSML 1: Concept: Basic principles of a new data science output observed and reported | - |
dc.title | Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines | - |
dc.type | Article | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1016/j.patter.2020.100115 | - |
dc.identifier.scopus | eid_2-s2.0-85102966494 | - |
dc.identifier.volume | 1 | - |
dc.identifier.issue | 7 | - |
dc.identifier.spage | article no. 100115 | - |
dc.identifier.epage | article no. 100115 | - |
dc.identifier.eissn | 2666-3899 | - |