File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1214/21-AOAS1516
- Scopus: eid_2-s2.0-85127832461
- WOS: WOS:000774613000026
- Find via
Supplementary
- Citations:
- Appears in Collections:
Article: PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA
Title | PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA |
---|---|
Authors | |
Keywords | clustering Missing values multiple imputation random forest spatial interpolation |
Issue Date | 2022 |
Citation | Annals of Applied Statistics, 2022, v. 16, n. 1, p. 551-572 How to Cite? |
Abstract | Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals’ fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches—two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (k-nearest neighbors, KNN, and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists. |
Persistent Identifier | http://hdl.handle.net/10722/334824 |
ISSN | 2023 Impact Factor: 1.3 2023 SCImago Journal Rankings: 0.954 |
ISI Accession Number ID |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Gu, Yu | - |
dc.contributor.author | Preisser, John S. | - |
dc.contributor.author | Zeng, Donglin | - |
dc.contributor.author | Shrestha, Poojan | - |
dc.contributor.author | Shah, Molina | - |
dc.contributor.author | Simancas-Pallares, Miguel A. | - |
dc.contributor.author | Ginnis, Jeannie | - |
dc.contributor.author | Divaris, Kimon | - |
dc.date.accessioned | 2023-10-20T06:51:00Z | - |
dc.date.available | 2023-10-20T06:51:00Z | - |
dc.date.issued | 2022 | - |
dc.identifier.citation | Annals of Applied Statistics, 2022, v. 16, n. 1, p. 551-572 | - |
dc.identifier.issn | 1932-6157 | - |
dc.identifier.uri | http://hdl.handle.net/10722/334824 | - |
dc.description.abstract | Community water fluoridation is an important component of oral health promotion, as fluoride exposure is a well-documented dental caries-preventive agent. Direct measurements of domestic water fluoride content provide valuable information regarding individuals’ fluoride exposure and thus caries risk; however, they are logistically challenging to carry out at a large scale in oral health research. This article describes the development and evaluation of a novel method for the imputation of missing domestic water fluoride concentration data informed by spatial autocorrelation. The context is a state-wide epidemiologic study of pediatric oral health in North Carolina, where domestic water fluoride concentration information was missing for approximately 75% of study participants with clinical data on dental caries. A new machine-learning-based imputation method that combines partitioning around medoids clustering and random forest classification (PAMRF) is developed and implemented. Imputed values are filtered according to allowable error rates or target sample size, depending on the requirements of each application. In leave-one-out cross-validation and simulation studies, PAMRF outperforms four existing imputation approaches—two conventional spatial interpolation methods (i.e., inverse-distance weighting, IDW and universal kriging, UK) and two supervised learning methods (k-nearest neighbors, KNN, and classification and regression trees, CART). The inclusion of multiply imputed values in the estimation of the association between fluoride concentration and dental caries prevalence resulted in essentially no change in PAMRF estimates but substantial gains in precision due to larger effective sample size. PAMRF is a powerful new method for the imputation of missing fluoride values where geographical information exists. | - |
dc.language | eng | - |
dc.relation.ispartof | Annals of Applied Statistics | - |
dc.subject | clustering | - |
dc.subject | Missing values | - |
dc.subject | multiple imputation | - |
dc.subject | random forest | - |
dc.subject | spatial interpolation | - |
dc.title | PARTITIONING AROUND MEDOIDS CLUSTERING AND RANDOM FOREST CLASSIFICATION FOR GIS-INFORMED IMPUTATION OF FLUORIDE CONCENTRATION DATA | - |
dc.type | Article | - |
dc.description.nature | link_to_subscribed_fulltext | - |
dc.identifier.doi | 10.1214/21-AOAS1516 | - |
dc.identifier.scopus | eid_2-s2.0-85127832461 | - |
dc.identifier.volume | 16 | - |
dc.identifier.issue | 1 | - |
dc.identifier.spage | 551 | - |
dc.identifier.epage | 572 | - |
dc.identifier.eissn | 1941-7330 | - |
dc.identifier.isi | WOS:000774613000026 | - |