File Download
Supplementary
-
Citations:
- Appears in Collections:
Article: The performance of large language models in dentomaxillofacial radiology: a systematic review
| Title | The performance of large language models in dentomaxillofacial radiology: a systematic review |
|---|---|
| Authors | |
| Issue Date | 12-Aug-2025 |
| Publisher | British Institute of Radiology |
| Citation | Dentomaxillofacial Radiology, 2025 How to Cite? |
| Abstract | Objectives This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR). Methods Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards. Results The initial search yielded 1621 titles, and 19 studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (n = 8), diagnosis and treatment recommendations (n = 7), and radiology report generation and patient communication (n = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37% to 92.5% and expert ratings for differential diagnosis and treatment planning between 3.6 and 4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3% and 86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4% to 81.3%. Conclusions LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings. |
| Persistent Identifier | http://hdl.handle.net/10722/366805 |
| ISSN | 2023 Impact Factor: 2.9 2023 SCImago Journal Rankings: 0.816 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Liu, Zekai | - |
| dc.contributor.author | Nalley, Andrew | - |
| dc.contributor.author | Hao, Jing | - |
| dc.contributor.author | Ai, Qi Yong H | - |
| dc.contributor.author | Yeung, Andy Wai Kan | - |
| dc.contributor.author | Tanaka, Ray | - |
| dc.contributor.author | Hung, Kuo Feng | - |
| dc.date.accessioned | 2025-11-25T04:22:00Z | - |
| dc.date.available | 2025-11-25T04:22:00Z | - |
| dc.date.issued | 2025-08-12 | - |
| dc.identifier.citation | Dentomaxillofacial Radiology, 2025 | - |
| dc.identifier.issn | 0250-832X | - |
| dc.identifier.uri | http://hdl.handle.net/10722/366805 | - |
| dc.description.abstract | <p>Objectives</p><p>This study aimed to systematically review the current performance of large language models (LLMs) in dento-maxillofacial radiology (DMFR).</p><p>Methods</p><p>Five electronic databases were used to identify studies that developed, fine-tuned, or evaluated LLMs for DMFR-related tasks. Data extracted included study purpose, LLM type, images/text source, applied language, dataset characteristics, input and output, performance outcomes, evaluation methods, and reference standards. Customized assessment criteria adapted from the TRIPOD-LLM reporting guideline were used to evaluate the risk-of-bias in the included studies specifically regarding the clarity of dataset origin, the robustness of performance evaluation methods, and the validity of the reference standards.</p><p>Results</p><p>The initial search yielded 1621 titles, and 19 studies were included. These studies investigated the use of LLMs for tasks including the production and answering of DMFR-related qualification exams and educational questions (<em>n</em> = 8), diagnosis and treatment recommendations (<em>n</em> = 7), and radiology report generation and patient communication (<em>n</em> = 4). LLMs demonstrated varied performance in diagnosing dental conditions, with accuracy ranging from 37% to 92.5% and expert ratings for differential diagnosis and treatment planning between 3.6 and 4.7 on a 5-point scale. For DMFR-related qualification exams and board-style questions, LLMs achieved correctness rates between 33.3% and 86.1%. Automated radiology report generation showed moderate performance with accuracy ranging from 70.4% to 81.3%.</p><p>Conclusions</p><p>LLMs demonstrate promising potential in DMFR, particularly for diagnostic, educational, and report generation tasks. However, their current accuracy, completeness, and consistency remain variable. Further development, validation, and standardization are needed before LLMs can be reliably integrated as supportive tools in clinical workflows and educational settings.</p> | - |
| dc.language | eng | - |
| dc.publisher | British Institute of Radiology | - |
| dc.relation.ispartof | Dentomaxillofacial Radiology | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.title | The performance of large language models in dentomaxillofacial radiology: a systematic review | - |
| dc.type | Article | - |
| dc.description.nature | published_or_final_version | - |
| dc.identifier.doi | 10.1093/dmfr/twaf060 | - |
| dc.identifier.eissn | 1476-542X | - |
| dc.identifier.issnl | 0250-832X | - |

