File Download
Supplementary

postgraduate thesis: Application of machine learning in the development of risk prediction models of cardiovascular and renal complications in primary care Chinese diabetes mellitus patients

TitleApplication of machine learning in the development of risk prediction models of cardiovascular and renal complications in primary care Chinese diabetes mellitus patients
Authors
Advisors
Issue Date2022
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Dong, W. [董伟楠]. (2022). Application of machine learning in the development of risk prediction models of cardiovascular and renal complications in primary care Chinese diabetes mellitus patients. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractType 2 Diabetes Mellitus (T2DM) increases the risk of cardiovascular diseases (CVD) and end-stage renal disease (ESRD). Risk stratified individualized management of T2DM based on predicted 10-year risk of complications is recommended, but accurate risk prediction models for Chinese T2DM patients are lacking. This study aimed to develop and validate 10-year CVD and ESRD risk prediction models among Chinese T2DM patients managed in primary care using machine learning (ML) methods, which can support clinical decision making and facilitate individualized interventions. This was a 10-year population-based retrospective cohort study. 141,516 Chinese T2DM patients aged 18 years old or above, without history of CVD or ESRD and managed in public primary care clinics in 2008 were included and followed up until December 2017. Data on relevant routinely available predictors and outcomes were extracted from the computerized medical records of the Hospital Authority of Hong Kong Clinical Management System. A ML-based missing data imputation method, generative adversarial imputation network (GAIN), was evaluated in a simulation study, and then applied to substitute the missing values in the cohort data. Two-thirds of the subjects were randomly selected to develop sex-specific risk prediction models for CVD and ESRD, respectively. Extreme gradient boosting, Boruta method, and Shapley value-based methods were applied for ML modelling, predictors selection and model interpretation, respectively. Final ML models were selected based on both statistical significance and clinical relevance. The remaining one-third subjects were used as the validation sample to evaluate model performance, in terms of discrimination (Harrell’s C statistic) and calibration, for the overall cohort and subgroups. The performance of ML models was compared to that of Cox proportional hazard regression models. During a median follow-up period of 9.75 years, 32,445 (22.9%) subjects developed CVD and 8,496 (6.0%) subjects developed ESRD. Age, DM duration, urine albumin to creatinine ratio (urine ACR), eGFR, systolic blood pressure variability, and HbA1c variability were the most important predictors for both CVD and ESRD. ML models identified interesting nonlinear effects of some predictors, particularly the U-shape effects of eGFR and BMI on the risk of CVD. The ML models showed Harrell’s C more than 0.80 for CVD prediction and 0.90 for ESRD prediction, and good calibration (adjusted Hosmer-Lemeshow test p>0.05). The ML models performed significantly better than the Cox regression models overall, and in subgroups, and achieved better risk stratifications for individual patients. However, the prediction models were less accurate in the elderly more than 70 years old and some subgroups with specific clustering of risk factors. ML methods have been successfully applied to develop accurate prediction models on 10-year risk of CVD and ESRD for Chinese T2DM patients. Renal function indicators, and variabilities of blood pressure and HbA1c were strong predictors for both CVD and ESRD, which deserve more clinical attention. The models have been deployed to a web-based calculator and risk stratification charts that can be available at point of care. The next research agenda is to evaluate the feasibility and effectiveness of its embedment in routine primary care T2DM management.
DegreeDoctor of Philosophy
SubjectType 2 diabetes - Complications
Cardiovascular system - Diseases
Chronic renal failure
Dept/ProgramFamily Medicine and Primary Care
Persistent Identifierhttp://hdl.handle.net/10722/322934

 

DC FieldValueLanguage
dc.contributor.advisorLam, CLK-
dc.contributor.advisorWan, YFE-
dc.contributor.advisorWong, CKH-
dc.contributor.authorDong, Weinan-
dc.contributor.author董伟楠-
dc.date.accessioned2022-11-18T10:41:54Z-
dc.date.available2022-11-18T10:41:54Z-
dc.date.issued2022-
dc.identifier.citationDong, W. [董伟楠]. (2022). Application of machine learning in the development of risk prediction models of cardiovascular and renal complications in primary care Chinese diabetes mellitus patients. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/322934-
dc.description.abstractType 2 Diabetes Mellitus (T2DM) increases the risk of cardiovascular diseases (CVD) and end-stage renal disease (ESRD). Risk stratified individualized management of T2DM based on predicted 10-year risk of complications is recommended, but accurate risk prediction models for Chinese T2DM patients are lacking. This study aimed to develop and validate 10-year CVD and ESRD risk prediction models among Chinese T2DM patients managed in primary care using machine learning (ML) methods, which can support clinical decision making and facilitate individualized interventions. This was a 10-year population-based retrospective cohort study. 141,516 Chinese T2DM patients aged 18 years old or above, without history of CVD or ESRD and managed in public primary care clinics in 2008 were included and followed up until December 2017. Data on relevant routinely available predictors and outcomes were extracted from the computerized medical records of the Hospital Authority of Hong Kong Clinical Management System. A ML-based missing data imputation method, generative adversarial imputation network (GAIN), was evaluated in a simulation study, and then applied to substitute the missing values in the cohort data. Two-thirds of the subjects were randomly selected to develop sex-specific risk prediction models for CVD and ESRD, respectively. Extreme gradient boosting, Boruta method, and Shapley value-based methods were applied for ML modelling, predictors selection and model interpretation, respectively. Final ML models were selected based on both statistical significance and clinical relevance. The remaining one-third subjects were used as the validation sample to evaluate model performance, in terms of discrimination (Harrell’s C statistic) and calibration, for the overall cohort and subgroups. The performance of ML models was compared to that of Cox proportional hazard regression models. During a median follow-up period of 9.75 years, 32,445 (22.9%) subjects developed CVD and 8,496 (6.0%) subjects developed ESRD. Age, DM duration, urine albumin to creatinine ratio (urine ACR), eGFR, systolic blood pressure variability, and HbA1c variability were the most important predictors for both CVD and ESRD. ML models identified interesting nonlinear effects of some predictors, particularly the U-shape effects of eGFR and BMI on the risk of CVD. The ML models showed Harrell’s C more than 0.80 for CVD prediction and 0.90 for ESRD prediction, and good calibration (adjusted Hosmer-Lemeshow test p>0.05). The ML models performed significantly better than the Cox regression models overall, and in subgroups, and achieved better risk stratifications for individual patients. However, the prediction models were less accurate in the elderly more than 70 years old and some subgroups with specific clustering of risk factors. ML methods have been successfully applied to develop accurate prediction models on 10-year risk of CVD and ESRD for Chinese T2DM patients. Renal function indicators, and variabilities of blood pressure and HbA1c were strong predictors for both CVD and ESRD, which deserve more clinical attention. The models have been deployed to a web-based calculator and risk stratification charts that can be available at point of care. The next research agenda is to evaluate the feasibility and effectiveness of its embedment in routine primary care T2DM management. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshType 2 diabetes - Complications-
dc.subject.lcshCardiovascular system - Diseases-
dc.subject.lcshChronic renal failure-
dc.titleApplication of machine learning in the development of risk prediction models of cardiovascular and renal complications in primary care Chinese diabetes mellitus patients-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineFamily Medicine and Primary Care-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2022-
dc.identifier.mmsid991044609103403414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats