File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA

TitlePARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA
Authors
Keywordsclustering
Missing values
multiple imputation
random forest
spatial interpolation
Issue Date2022
Citation
Annals of Applied Statistics, 2022, v. 16, n. 1, p. 551-572 How to Cite?
AbstractCommunity water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals’ fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches—two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (k-nearest neighbors, KNN, and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists.
Persistent Identifierhttp://hdl.handle.net/10722/334824
ISSN
2023 Impact Factor: 1.3
2023 SCImago Journal Rankings: 0.954
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorGu, Yu-
dc.contributor.authorPreisser, John S.-
dc.contributor.authorZeng, Donglin-
dc.contributor.authorShrestha, Poojan-
dc.contributor.authorShah, Molina-
dc.contributor.authorSimancas-Pallares, Miguel A.-
dc.contributor.authorGinnis, Jeannie-
dc.contributor.authorDivaris, Kimon-
dc.date.accessioned2023-10-20T06:51:00Z-
dc.date.available2023-10-20T06:51:00Z-
dc.date.issued2022-
dc.identifier.citationAnnals of Applied Statistics, 2022, v. 16, n. 1, p. 551-572-
dc.identifier.issn1932-6157-
dc.identifier.urihttp://hdl.handle.net/10722/334824-
dc.description.abstractCommunity water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals’ fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches—two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (k-nearest neighbors, KNN, and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists.-
dc.languageeng-
dc.relation.ispartofAnnals of Applied Statistics-
dc.subjectclustering-
dc.subjectMissing values-
dc.subjectmultiple imputation-
dc.subjectrandom forest-
dc.subjectspatial interpolation-
dc.titlePARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1214/21-AOAS1516-
dc.identifier.scopuseid_2-s2.0-85127832461-
dc.identifier.volume16-
dc.identifier.issue1-
dc.identifier.spage551-
dc.identifier.epage572-
dc.identifier.eissn1941-7330-
dc.identifier.isiWOS:000774613000026-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats