File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Classification Trees for Imbalanced Data: Surface-to-Volume Regularization

TitleClassification Trees for Imbalanced Data: Surface-to-Volume Regularization
Authors
KeywordsCART
Categorical data
Decision boundary
Shape penalization
Issue Date2023
Citation
Journal of the American Statistical Association, 2023, v. 118, n. 543, p. 1707-1717 How to Cite?
AbstractClassification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications. Supplementary materials for this article are available online.
Persistent Identifierhttp://hdl.handle.net/10722/367572
ISSN
2023 Impact Factor: 3.0
2023 SCImago Journal Rankings: 3.922

 

DC FieldValueLanguage
dc.contributor.authorZhu, Yichen-
dc.contributor.authorLi, Cheng-
dc.contributor.authorDunson, David B.-
dc.date.accessioned2025-12-19T07:57:48Z-
dc.date.available2025-12-19T07:57:48Z-
dc.date.issued2023-
dc.identifier.citationJournal of the American Statistical Association, 2023, v. 118, n. 543, p. 1707-1717-
dc.identifier.issn0162-1459-
dc.identifier.urihttp://hdl.handle.net/10722/367572-
dc.description.abstractClassification algorithms face difficulties when one or more classes have limited training data. We are particularly interested in classification trees, due to their interpretability and flexibility. When data are limited in one or more of the classes, the estimated decision boundaries are often irregularly shaped due to the limited sample size, leading to poor generalization error. We propose a novel approach that penalizes the Surface-to-Volume Ratio (SVR) of the decision set, obtaining a new class of SVR-Tree algorithms. We develop a simple and computationally efficient implementation while proving estimation consistency for SVR-Tree and rate of convergence for an idealized empirical risk minimizer of SVR-Tree. SVR-Tree is compared with multiple algorithms that are designed to deal with imbalance through real data applications. Supplementary materials for this article are available online.-
dc.languageeng-
dc.relation.ispartofJournal of the American Statistical Association-
dc.subjectCART-
dc.subjectCategorical data-
dc.subjectDecision boundary-
dc.subjectShape penalization-
dc.titleClassification Trees for Imbalanced Data: Surface-to-Volume Regularization-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1080/01621459.2021.2005609-
dc.identifier.scopuseid_2-s2.0-85122314415-
dc.identifier.volume118-
dc.identifier.issue543-
dc.identifier.spage1707-
dc.identifier.epage1717-
dc.identifier.eissn1537-274X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats