File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping

TitleAn overview and comparison of machine-learning techniques for classification purposes in digital soil mapping
Authors
KeywordsDigital soil mapping
Data-mining
Soil classification
Model comparison
Machine-learning
Issue Date2016
Citation
Geoderma, 2016, v. 265, p. 62-77 How to Cite?
Abstract© 2015 Elsevier B.V. Machine-learning is the automated process of uncovering patterns in large datasets using computer-based statistical models, where a fitted model may then be used for prediction purposes on new data. Despite the growing number of machine-learning algorithms that have been developed, relatively few studies have provided a comparison of an array of different learners - typically, model comparison studies have been restricted to a comparison of only a few models. This study evaluates and compares a suite of 10 machine-learners as classification algorithms for the prediction of soil taxonomic units in the Lower Fraser Valley, British Columbia, Canada. A variety of machine-learners (CART, CART with bagging, Random Forest, k-nearest neighbor, nearest shrunken centroid, artificial neural network, multinomial logistic regression, logistic model trees, and support vector machine) were tested in the extraction of the complex relationships between soil taxonomic units (great groups and orders) from a conventional soil survey and a suite of 20 environmental covariates representing the topography, climate, and vegetation of the study area. Methods used to extract training data from a soil survey included by-polygon, equal-class, area-weighted, and area-weighted with random over sampling (ROS) approaches. The fitted models, which consist of the soil-environmental relationships, were then used to predict soil great groups and orders for the entire study area at a 100 m spatial resolution. The resulting maps were validated using 262 points from legacy soil data. On average, the area-weighted sampling approach for developing training data from a soil survey was most effective. Using a validation of R= 1 cell, the k-nearest neighbor and support vector machine with radial basis function resulted in the highest accuracy of 72% for great groups using ROS; however, models such as CART with bagging, logistic model trees, and Random Forest were preferred due to the speed of parameterization and the interpretability of the results while resulting in similar accuracies ranging from 65-70% using the area-weighted sampling approach. Model choice and sample design greatly influenced outputs. This study provides a comprehensive comparison of machine-learning techniques for classification purposes in soil science and may assist in model selection for digital soil mapping and geomorphic modeling studies in the future.
Persistent Identifierhttp://hdl.handle.net/10722/265680
ISSN
2023 Impact Factor: 5.6
2023 SCImago Journal Rankings: 1.761
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorHeung, Brandon-
dc.contributor.authorHo, Hung Chak-
dc.contributor.authorZhang, Jin-
dc.contributor.authorKnudby, Anders-
dc.contributor.authorBulmer, Chuck E.-
dc.contributor.authorSchmidt, Margaret G.-
dc.date.accessioned2018-12-03T01:21:22Z-
dc.date.available2018-12-03T01:21:22Z-
dc.date.issued2016-
dc.identifier.citationGeoderma, 2016, v. 265, p. 62-77-
dc.identifier.issn0016-7061-
dc.identifier.urihttp://hdl.handle.net/10722/265680-
dc.description.abstract© 2015 Elsevier B.V. Machine-learning is the automated process of uncovering patterns in large datasets using computer-based statistical models, where a fitted model may then be used for prediction purposes on new data. Despite the growing number of machine-learning algorithms that have been developed, relatively few studies have provided a comparison of an array of different learners - typically, model comparison studies have been restricted to a comparison of only a few models. This study evaluates and compares a suite of 10 machine-learners as classification algorithms for the prediction of soil taxonomic units in the Lower Fraser Valley, British Columbia, Canada. A variety of machine-learners (CART, CART with bagging, Random Forest, k-nearest neighbor, nearest shrunken centroid, artificial neural network, multinomial logistic regression, logistic model trees, and support vector machine) were tested in the extraction of the complex relationships between soil taxonomic units (great groups and orders) from a conventional soil survey and a suite of 20 environmental covariates representing the topography, climate, and vegetation of the study area. Methods used to extract training data from a soil survey included by-polygon, equal-class, area-weighted, and area-weighted with random over sampling (ROS) approaches. The fitted models, which consist of the soil-environmental relationships, were then used to predict soil great groups and orders for the entire study area at a 100 m spatial resolution. The resulting maps were validated using 262 points from legacy soil data. On average, the area-weighted sampling approach for developing training data from a soil survey was most effective. Using a validation of R= 1 cell, the k-nearest neighbor and support vector machine with radial basis function resulted in the highest accuracy of 72% for great groups using ROS; however, models such as CART with bagging, logistic model trees, and Random Forest were preferred due to the speed of parameterization and the interpretability of the results while resulting in similar accuracies ranging from 65-70% using the area-weighted sampling approach. Model choice and sample design greatly influenced outputs. This study provides a comprehensive comparison of machine-learning techniques for classification purposes in soil science and may assist in model selection for digital soil mapping and geomorphic modeling studies in the future.-
dc.languageeng-
dc.relation.ispartofGeoderma-
dc.subjectDigital soil mapping-
dc.subjectData-mining-
dc.subjectSoil classification-
dc.subjectModel comparison-
dc.subjectMachine-learning-
dc.titleAn overview and comparison of machine-learning techniques for classification purposes in digital soil mapping-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1016/j.geoderma.2015.11.014-
dc.identifier.scopuseid_2-s2.0-84947760924-
dc.identifier.volume265-
dc.identifier.spage62-
dc.identifier.epage77-
dc.identifier.isiWOS:000368746200008-
dc.identifier.issnl0016-7061-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats