Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines

Li, Jingyi Jessica; Tong, Xin

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.patter.2020.100115
Scopus: eid_2-s2.0-85102966494

Supplementary

Citations:
- Scopus: 33
Appears in Collections:
- Faculty of Business & Economics: Journal/Magazine Articles

See more details

Article: Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines

Title	Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines
Authors	Li, Jingyi Jessica Tong, Xin
Keywords	DSML 1: Concept: Basic principles of a new data science output observed and reported
Issue Date	2020
Citation	Patterns, 2020, v. 1, n. 7, article no. 100115 How to Cite? DOI: http://dx.doi.org/10.1016/j.patter.2020.100115
Abstract	Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example. In data science education, two analysis strategies, hypothesis testing and binary classification, are mostly covered in different courses and textbooks. In real data application, it can be puzzling whether a binary decision problem should be formulated as hypothesis testing or binary classification. This article aims to disentangle the puzzle for data science students and researchers by offering practical guidelines for choosing between the two strategies. Hypothesis testing and binary classification are two data analysis strategies taught mostly in different undergraduate classes and rarely compared with each other. As a result, which strategy is more appropriate for a specific real-world data analysis task is often ambiguous. To address this issue, this perspective article clarifies the distinctions between the two strategies and offers practical guidelines to the broad data science discipline.
Persistent Identifier	http://hdl.handle.net/10722/354181

DC Field	Value	Language
dc.contributor.author	Li, Jingyi Jessica	-
dc.contributor.author	Tong, Xin	-
dc.date.accessioned	2025-02-07T08:47:00Z	-
dc.date.available	2025-02-07T08:47:00Z	-
dc.date.issued	2020	-
dc.identifier.citation	Patterns, 2020, v. 1, n. 7, article no. 100115	-
dc.identifier.uri	http://hdl.handle.net/10722/354181	-
dc.description.abstract	Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here, we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example. In data science education, two analysis strategies, hypothesis testing and binary classification, are mostly covered in different courses and textbooks. In real data application, it can be puzzling whether a binary decision problem should be formulated as hypothesis testing or binary classification. This article aims to disentangle the puzzle for data science students and researchers by offering practical guidelines for choosing between the two strategies. Hypothesis testing and binary classification are two data analysis strategies taught mostly in different undergraduate classes and rarely compared with each other. As a result, which strategy is more appropriate for a specific real-world data analysis task is often ambiguous. To address this issue, this perspective article clarifies the distinctions between the two strategies and offers practical guidelines to the broad data science discipline.	-
dc.language	eng	-
dc.relation.ispartof	Patterns	-
dc.subject	DSML 1: Concept: Basic principles of a new data science output observed and reported	-
dc.title	Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1016/j.patter.2020.100115	-
dc.identifier.scopus	eid_2-s2.0-85102966494	-
dc.identifier.volume	1	-
dc.identifier.issue	7	-
dc.identifier.spage	article no. 100115	-
dc.identifier.epage	article no. 100115	-
dc.identifier.eissn	2666-3899	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Statistical Hypothesis Testing versus Machine Learning Binary Classification: Distinctions and Guidelines

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats