File Download
  Links for fulltext
     (May Require Subscription)

Article: The performance of large language models in dentomaxillofacial radiology: a systematic review

TitleThe performance of large language models in dentomaxillofacial radiology: a systematic review
Authors
Issue Date12-Aug-2025
PublisherBritish Institute of Radiology
Citation
Dentomaxillofacial Radiology, 2025 How to Cite?
Abstract

Objectives

This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR).

Methods

Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards.

Results

The initial search yielded 1621 titles, and 19 studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (n = 8), diagnosis and treatment recommendations (n = 7), and radiology report generation and patient communication (n = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37% to 92.5% and expert ratings for differential diagnosis and treatment planning between 3.6 and 4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3% and 86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4% to 81.3%.

Conclusions

LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.


Persistent Identifierhttp://hdl.handle.net/10722/366805
ISSN
2023 Impact Factor: 2.9
2023 SCImago Journal Rankings: 0.816

 

DC FieldValueLanguage
dc.contributor.authorLiu, Zekai-
dc.contributor.authorNalley, Andrew-
dc.contributor.authorHao, Jing-
dc.contributor.authorAi, Qi Yong H-
dc.contributor.authorYeung, Andy Wai Kan-
dc.contributor.authorTanaka, Ray-
dc.contributor.authorHung, Kuo Feng-
dc.date.accessioned2025-11-25T04:22:00Z-
dc.date.available2025-11-25T04:22:00Z-
dc.date.issued2025-08-12-
dc.identifier.citationDentomaxillofacial Radiology, 2025-
dc.identifier.issn0250-832X-
dc.identifier.urihttp://hdl.handle.net/10722/366805-
dc.description.abstract<p>Objectives</p><p>This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR).</p><p>Methods</p><p>Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards.</p><p>Results</p><p>The initial search yielded 1621 titles, and 19 studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (<em>n</em> = 8), diagnosis and treatment recommendations (<em>n</em> = 7), and radiology report generation and patient communication (<em>n</em> = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37% to 92.5% and expert ratings for differential diagnosis and treatment planning between 3.6 and 4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3% and 86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4% to 81.3%.</p><p>Conclusions</p><p>LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.</p>-
dc.languageeng-
dc.publisherBritish Institute of Radiology-
dc.relation.ispartofDentomaxillofacial Radiology-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleThe performance of large language models in dentomaxillofacial radiology: a systematic review-
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.1093/dmfr/twaf060-
dc.identifier.eissn1476-542X-
dc.identifier.issnl0250-832X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats