File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Data-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer

TitleData-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer
Authors
KeywordsArtificial intelligence
Data quality
Data-centric AI
Head and neck cancer
Machine learning
Review
Issue Date4-Mar-2023
PublisherSpringerOpen
Citation
Journal of Big Data, 2023, v. 10, n. 1 How to Cite?
AbstractMachine learning models have been increasingly considered to model head and neck cancer outcomes for improved screening, diagnosis, treatment, and prognostication of the disease. As the concept of data-centric artificial intelligence is still incipient in healthcare systems, little is known about the data quality of the models proposed for clinical utility. This is important as it supports the generalizability of the models and data standardization. Therefore, this study overviews the quality of structured and unstructured data used for machine learning model construction in head and neck cancer. Relevant studies reporting on the use of machine learning models based on structured and unstructured custom datasets between January 2016 and June 2022 were sourced from PubMed, EMBASE, Scopus, and Web of Science electronic databases. Prediction model Risk of Bias Assessment (PROBAST) tool was used to assess the quality of individual studies before comprehensive data quality parameters were assessed according to the type of dataset used for model construction. A total of 159 studies were included in the review; 106 utilized structured datasets while 53 utilized unstructured datasets. Data quality assessments were deliberately performed for 14.2% of structured datasets and 11.3% of unstructured datasets before model construction. Class imbalance and data fairness were the most common limitations in data quality for both types of datasets while outlier detection and lack of representative outcome classes were common in structured and unstructured datasets respectively. Furthermore, this review found that class imbalance reduced the discriminatory performance for models based on structured datasets while higher image resolution and good class overlap resulted in better model performance using unstructured datasets during internal validation. Overall, data quality was infrequently assessed before the construction of ML models in head and neck cancer irrespective of the use of structured or unstructured datasets. To improve model generalizability, the assessments discussed in this study should be introduced during model construction to achieve data-centric intelligent systems for head and neck cancer management.
Persistent Identifierhttp://hdl.handle.net/10722/337591
ISSN
2021 Impact Factor: 10.835
2020 SCImago Journal Rankings: 1.031
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorAdeoye, J-
dc.contributor.authorHui, LL-
dc.contributor.authorSu, YX-
dc.date.accessioned2024-03-11T10:22:19Z-
dc.date.available2024-03-11T10:22:19Z-
dc.date.issued2023-03-04-
dc.identifier.citationJournal of Big Data, 2023, v. 10, n. 1-
dc.identifier.issn2196-1115-
dc.identifier.urihttp://hdl.handle.net/10722/337591-
dc.description.abstractMachine learning models have been increasingly considered to model head and neck cancer outcomes for improved screening, diagnosis, treatment, and prognostication of the disease. As the concept of data-centric artificial intelligence is still incipient in healthcare systems, little is known about the data quality of the models proposed for clinical utility. This is important as it supports the generalizability of the models and data standardization. Therefore, this study overviews the quality of structured and unstructured data used for machine learning model construction in head and neck cancer. Relevant studies reporting on the use of machine learning models based on structured and unstructured custom datasets between January 2016 and June 2022 were sourced from PubMed, EMBASE, Scopus, and Web of Science electronic databases. Prediction model Risk of Bias Assessment (PROBAST) tool was used to assess the quality of individual studies before comprehensive data quality parameters were assessed according to the type of dataset used for model construction. A total of 159 studies were included in the review; 106 utilized structured datasets while 53 utilized unstructured datasets. Data quality assessments were deliberately performed for 14.2% of structured datasets and 11.3% of unstructured datasets before model construction. Class imbalance and data fairness were the most common limitations in data quality for both types of datasets while outlier detection and lack of representative outcome classes were common in structured and unstructured datasets respectively. Furthermore, this review found that class imbalance reduced the discriminatory performance for models based on structured datasets while higher image resolution and good class overlap resulted in better model performance using unstructured datasets during internal validation. Overall, data quality was infrequently assessed before the construction of ML models in head and neck cancer irrespective of the use of structured or unstructured datasets. To improve model generalizability, the assessments discussed in this study should be introduced during model construction to achieve data-centric intelligent systems for head and neck cancer management.-
dc.languageeng-
dc.publisherSpringerOpen-
dc.relation.ispartofJournal of Big Data-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectArtificial intelligence-
dc.subjectData quality-
dc.subjectData-centric AI-
dc.subjectHead and neck cancer-
dc.subjectMachine learning-
dc.subjectReview-
dc.titleData-centric artificial intelligence in oncology: a systematic review assessing data quality in machine learning models for head and neck cancer-
dc.typeArticle-
dc.identifier.doi10.1186/s40537-023-00703-w-
dc.identifier.scopuseid_2-s2.0-85149912660-
dc.identifier.volume10-
dc.identifier.issue1-
dc.identifier.eissn2196-1115-
dc.identifier.isiWOS:000943314200001-
dc.publisher.placeLONDON-
dc.identifier.issnl2196-1115-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats