File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)

Article: k-Tree: Crossing sharp boundaries in regression trees to find neighbors

Titlek-Tree: Crossing sharp boundaries in regression trees to find neighbors
Authors
KeywordsAdaptive distance metric
Decision tree
k-nearest neighbors
Machine learning
Issue Date17-Apr-2025
PublisherElsevier
Citation
European Journal of Operational Research, 2025, v. 324, n. 3, p. 567-579 How to Cite?
Abstract

Traditional classification and regression trees (CARTs) utilize a top-down, greedy approach to split the feature space into sharply defined, axis-aligned sub-regions (leaves). Each leaf treats all of the samples therein uniformly during the prediction process, leading to a constant predictor. Although this approach is well known for its interpretability and efficiency, it overlooks the complex local distributions within and across leaves. As the number of features increases, this limitation becomes more pronounced, often resulting in a concentration of samples near the boundaries of the leaves. Such clustering suggests that there is potential in identifying closer neighbors in adjacent leaves, a phenomenon that is unexplored in the literature. Our study addresses this gap by introducing the -Tree methodology, a novel method that extends the search for nearest neighbors beyond a single leaf to include adjacent leaves. This approach has two key innovations: (1) establishing an adjacency relationship between leaves across the tree space and (2) designing novel intra-leaf and inter-leaf distance metrics through an optimization lens, which are tailored to local data distributions within the tree. We explore three implementations of the -Tree methodology: (1) the Post-hoc -Tree (P-Tree), which integrates the -Tree methodology into constructed decision trees, (2) the Advanced -Tree, which seamlessly incorporates the -Tree methodology during the tree construction process, and (3) the P-random forest, which integrates the P-Tree principles with the random forest framework. The results of empirical evaluations conducted on a variety of real-world and synthetic datasets demonstrate that the -Tree methods have greater prediction accuracy over the traditional models. These results highlight the potential of the -Tree methodology in enhancing predictive analytics by providing a deeper insight into the relationships between samples within the tree space.


Persistent Identifierhttp://hdl.handle.net/10722/369101
ISSN
2023 Impact Factor: 6.0
2023 SCImago Journal Rankings: 2.321

 

DC FieldValueLanguage
dc.contributor.authorTian, Xuecheng-
dc.contributor.authorWang, Shuaian-
dc.contributor.authorZhen, Lu-
dc.contributor.authorShen, Zuo-Jun Max-
dc.date.accessioned2026-01-17T00:35:25Z-
dc.date.available2026-01-17T00:35:25Z-
dc.date.issued2025-04-17-
dc.identifier.citationEuropean Journal of Operational Research, 2025, v. 324, n. 3, p. 567-579-
dc.identifier.issn0377-2217-
dc.identifier.urihttp://hdl.handle.net/10722/369101-
dc.description.abstract<p>Traditional classification and regression trees (CARTs) utilize a top-down, greedy approach to split the feature space into sharply defined, axis-aligned sub-regions (leaves). Each leaf treats all of the samples therein uniformly during the prediction process, leading to a constant predictor. Although this approach is well known for its interpretability and efficiency, it overlooks the complex local distributions within and across leaves. As the number of features increases, this limitation becomes more pronounced, often resulting in a concentration of samples near the boundaries of the leaves. Such clustering suggests that there is potential in identifying closer neighbors in adjacent leaves, a phenomenon that is unexplored in the literature. Our study addresses this gap by introducing the -Tree methodology, a novel method that extends the search for nearest neighbors beyond a single leaf to include adjacent leaves. This approach has two key innovations: (1) establishing an adjacency relationship between leaves across the tree space and (2) designing novel intra-leaf and inter-leaf distance metrics through an optimization lens, which are tailored to local data distributions within the tree. We explore three implementations of the -Tree methodology: (1) the Post-hoc -Tree (P-Tree), which integrates the -Tree methodology into constructed decision trees, (2) the Advanced -Tree, which seamlessly incorporates the -Tree methodology during the tree construction process, and (3) the P-random forest, which integrates the P-Tree principles with the random forest framework. The results of empirical evaluations conducted on a variety of real-world and synthetic datasets demonstrate that the -Tree methods have greater prediction accuracy over the traditional models. These results highlight the potential of the -Tree methodology in enhancing predictive analytics by providing a deeper insight into the relationships between samples within the tree space.<br></p>-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofEuropean Journal of Operational Research-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectAdaptive distance metric-
dc.subjectDecision tree-
dc.subjectk-nearest neighbors-
dc.subjectMachine learning-
dc.titlek-Tree: Crossing sharp boundaries in regression trees to find neighbors-
dc.typeArticle-
dc.identifier.doi10.1016/j.ejor.2025.02.031-
dc.identifier.scopuseid_2-s2.0-105002677032-
dc.identifier.volume324-
dc.identifier.issue3-
dc.identifier.spage567-
dc.identifier.epage579-
dc.identifier.eissn1872-6860-
dc.identifier.issnl0377-2217-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats