An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping

Heung, Brandon; Ho, Hung Chak; Zhang, Jin; Knudby, Anders; Bulmer, Chuck E.; Schmidt, Margaret G.

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.geoderma.2015.11.014
Scopus: eid_2-s2.0-84947760924
WOS: WOS:000368746200008
Find via

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Urban Planning & Design: Journal/Magazine Articles

Article: An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping

Title	An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping
Authors	Heung, Brandon Ho, Hung Chak Zhang, Jin Knudby, Anders Bulmer, Chuck E.Schmidt, Margaret G.
Keywords	Digital soil mapping Data-mining Soil classification Model comparison Machine-learning
Issue Date	2016
Citation	Geoderma, 2016, v. 265, p. 62-77 How to Cite? DOI: http://dx.doi.org/10.1016/j.geoderma.2015.11.014
Abstract	© 2015 Elsevier B.V. Machine-learning is the automated process of uncovering patterns in large datasets using computer-based statistical models, where a fitted model may then be used for prediction purposes on new data. Despite the growing number of machine-learning algorithms that have been developed, relatively few studies have provided a comparison of an array of different learners - typically, model comparison studies have been restricted to a comparison of only a few models. This study evaluates and compares a suite of 10 machine-learners as classification algorithms for the prediction of soil taxonomic units in the Lower Fraser Valley, British Columbia, Canada. A variety of machine-learners (CART, CART with bagging, Random Forest, k-nearest neighbor, nearest shrunken centroid, artificial neural network, multinomial logistic regression, logistic model trees, and support vector machine) were tested in the extraction of the complex relationships between soil taxonomic units (great groups and orders) from a conventional soil survey and a suite of 20 environmental covariates representing the topography, climate, and vegetation of the study area. Methods used to extract training data from a soil survey included by-polygon, equal-class, area-weighted, and area-weighted with random over sampling (ROS) approaches. The fitted models, which consist of the soil-environmental relationships, were then used to predict soil great groups and orders for the entire study area at a 100 m spatial resolution. The resulting maps were validated using 262 points from legacy soil data. On average, the area-weighted sampling approach for developing training data from a soil survey was most effective. Using a validation of R= 1 cell, the k-nearest neighbor and support vector machine with radial basis function resulted in the highest accuracy of 72% for great groups using ROS; however, models such as CART with bagging, logistic model trees, and Random Forest were preferred due to the speed of parameterization and the interpretability of the results while resulting in similar accuracies ranging from 65-70% using the area-weighted sampling approach. Model choice and sample design greatly influenced outputs. This study provides a comprehensive comparison of machine-learning techniques for classification purposes in soil science and may assist in model selection for digital soil mapping and geomorphic modeling studies in the future.
Persistent Identifier	http://hdl.handle.net/10722/265680
ISSN	0016-7061 2023 Impact Factor: 5.6 2023 SCImago Journal Rankings: 1.761
ISI Accession Number ID	WOS:000368746200008

DC Field	Value	Language
dc.contributor.author	Heung, Brandon	-
dc.contributor.author	Ho, Hung Chak	-
dc.contributor.author	Zhang, Jin	-
dc.contributor.author	Knudby, Anders	-
dc.contributor.author	Bulmer, Chuck E.	-
dc.contributor.author	Schmidt, Margaret G.	-
dc.date.accessioned	2018-12-03T01:21:22Z	-
dc.date.available	2018-12-03T01:21:22Z	-
dc.date.issued	2016	-
dc.identifier.citation	Geoderma, 2016, v. 265, p. 62-77	-
dc.identifier.issn	0016-7061	-
dc.identifier.uri	http://hdl.handle.net/10722/265680	-
dc.description.abstract	© 2015 Elsevier B.V. Machine-learning is the automated process of uncovering patterns in large datasets using computer-based statistical models, where a fitted model may then be used for prediction purposes on new data. Despite the growing number of machine-learning algorithms that have been developed, relatively few studies have provided a comparison of an array of different learners - typically, model comparison studies have been restricted to a comparison of only a few models. This study evaluates and compares a suite of 10 machine-learners as classification algorithms for the prediction of soil taxonomic units in the Lower Fraser Valley, British Columbia, Canada. A variety of machine-learners (CART, CART with bagging, Random Forest, k-nearest neighbor, nearest shrunken centroid, artificial neural network, multinomial logistic regression, logistic model trees, and support vector machine) were tested in the extraction of the complex relationships between soil taxonomic units (great groups and orders) from a conventional soil survey and a suite of 20 environmental covariates representing the topography, climate, and vegetation of the study area. Methods used to extract training data from a soil survey included by-polygon, equal-class, area-weighted, and area-weighted with random over sampling (ROS) approaches. The fitted models, which consist of the soil-environmental relationships, were then used to predict soil great groups and orders for the entire study area at a 100 m spatial resolution. The resulting maps were validated using 262 points from legacy soil data. On average, the area-weighted sampling approach for developing training data from a soil survey was most effective. Using a validation of R= 1 cell, the k-nearest neighbor and support vector machine with radial basis function resulted in the highest accuracy of 72% for great groups using ROS; however, models such as CART with bagging, logistic model trees, and Random Forest were preferred due to the speed of parameterization and the interpretability of the results while resulting in similar accuracies ranging from 65-70% using the area-weighted sampling approach. Model choice and sample design greatly influenced outputs. This study provides a comprehensive comparison of machine-learning techniques for classification purposes in soil science and may assist in model selection for digital soil mapping and geomorphic modeling studies in the future.	-
dc.language	eng	-
dc.relation.ispartof	Geoderma	-
dc.subject	Digital soil mapping	-
dc.subject	Data-mining	-
dc.subject	Soil classification	-
dc.subject	Model comparison	-
dc.subject	Machine-learning	-
dc.title	An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping	-
dc.type	Article	-
dc.description.nature	link_to_subscribed_fulltext	-
dc.identifier.doi	10.1016/j.geoderma.2015.11.014	-
dc.identifier.scopus	eid_2-s2.0-84947760924	-
dc.identifier.volume	265	-
dc.identifier.spage	62	-
dc.identifier.epage	77	-
dc.identifier.isi	WOS:000368746200008	-
dc.identifier.issnl	0016-7061	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats