File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.3390/diagnostics15182315
- Scopus: eid_2-s2.0-105017026789
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Performance of a Vision-Language Model in Detecting Common Dental Conditions on Panoramic Radiographs Using Different Tooth Numbering Systems
| Title | Performance of a Vision-Language Model in Detecting Common Dental Conditions on Panoramic Radiographs Using Different Tooth Numbering Systems |
|---|---|
| Authors | |
| Keywords | artificial intelligence dentistry diagnostic accuracy large language models panoramic radiographs vision-language models |
| Issue Date | 1-Sep-2025 |
| Publisher | MDPI |
| Citation | Diagnostics, 2025, v. 15, n. 18 How to Cite? |
| Abstract | Objectives: The aim of this study was to evaluate the performance of GPT-4o in identifying nine common dental conditions on panoramic radiographs, both overall and at specific tooth sites, and to assess whether the use of different tooth numbering systems (FDI and Universal) in prompts would affect its diagnostic accuracy. Methods: Fifty panoramic radiographs exhibiting various common dental conditions including missing teeth, impacted teeth, caries, endodontically treated teeth, teeth with restorations, periapical lesions, periodontal bone loss, tooth fractures, cracks, retained roots, dental implants, osteolytic lesions, and osteosclerosis were included. Each image was evaluated twice by GPT-4o in May 2025, using structured prompts based on either the FDI or Universal tooth numbering system, to identify the presence of these conditions at specific tooth sites or regions. GPT-4o responses were compared to a consensus reference standard established by an oral-maxillofacial radiology team. GPT-4o’s performance was evaluated using balanced accuracy, sensitivity, specificity, and F1 score both at the patient and tooth levels. Results: A total of 100 GPT-4o responses were generated. At the patient level, balanced accuracy ranged from 46.25% to 98.83% (FDI) and 49.75% to 92.86% (Universal), with the highest accuracies for dental implants (92.86–98.83%). F1-scores and sensitivities were highest for implants, missing, and impacted teeth, but zero for caries, periapical lesions, and fractures. Specificity was generally high across conditions. Notable discrepancies were observed between patient- and tooth-level performance, especially for implants and restorations. GPT-4o’s performance was similar between using the two numbering systems. Conclusions: GPT-4o demonstrated superior performance in detecting dental implants and treated or restored teeth but inferior performance for caries, periapical lesions, and fractures. Diagnostic accuracy was higher at the patient level than at the tooth level, with similar performances for both numbering systems. Future studies with larger, more diverse datasets and multiple models are needed. |
| Persistent Identifier | http://hdl.handle.net/10722/366108 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Liu, Zekai | - |
| dc.contributor.author | Ai, Qi Yong H. | - |
| dc.contributor.author | Yeung, Andy Wai Kan | - |
| dc.contributor.author | Tanaka, Ray | - |
| dc.contributor.author | Nalley, Andrew | - |
| dc.contributor.author | Hung, Kuo Feng | - |
| dc.date.accessioned | 2025-11-15T00:35:35Z | - |
| dc.date.available | 2025-11-15T00:35:35Z | - |
| dc.date.issued | 2025-09-01 | - |
| dc.identifier.citation | Diagnostics, 2025, v. 15, n. 18 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/366108 | - |
| dc.description.abstract | <p>Objectives: The aim of this study was to evaluate the performance of GPT-4o in identifying nine common dental conditions on panoramic radiographs, both overall and at specific tooth sites, and to assess whether the use of different tooth numbering systems (FDI and Universal) in prompts would affect its diagnostic accuracy. Methods: Fifty panoramic radiographs exhibiting various common dental conditions including missing teeth, impacted teeth, caries, endodontically treated teeth, teeth with restorations, periapical lesions, periodontal bone loss, tooth fractures, cracks, retained roots, dental implants, osteolytic lesions, and osteosclerosis were included. Each image was evaluated twice by GPT-4o in May 2025, using structured prompts based on either the FDI or Universal tooth numbering system, to identify the presence of these conditions at specific tooth sites or regions. GPT-4o responses were compared to a consensus reference standard established by an oral-maxillofacial radiology team. GPT-4o’s performance was evaluated using balanced accuracy, sensitivity, specificity, and F1 score both at the patient and tooth levels. Results: A total of 100 GPT-4o responses were generated. At the patient level, balanced accuracy ranged from 46.25% to 98.83% (FDI) and 49.75% to 92.86% (Universal), with the highest accuracies for dental implants (92.86–98.83%). F1-scores and sensitivities were highest for implants, missing, and impacted teeth, but zero for caries, periapical lesions, and fractures. Specificity was generally high across conditions. Notable discrepancies were observed between patient- and tooth-level performance, especially for implants and restorations. GPT-4o’s performance was similar between using the two numbering systems. Conclusions: GPT-4o demonstrated superior performance in detecting dental implants and treated or restored teeth but inferior performance for caries, periapical lesions, and fractures. Diagnostic accuracy was higher at the patient level than at the tooth level, with similar performances for both numbering systems. Future studies with larger, more diverse datasets and multiple models are needed.</p> | - |
| dc.language | eng | - |
| dc.publisher | MDPI | - |
| dc.relation.ispartof | Diagnostics | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject | artificial intelligence | - |
| dc.subject | dentistry | - |
| dc.subject | diagnostic accuracy | - |
| dc.subject | large language models | - |
| dc.subject | panoramic radiographs | - |
| dc.subject | vision-language models | - |
| dc.title | Performance of a Vision-Language Model in Detecting Common Dental Conditions on Panoramic Radiographs Using Different Tooth Numbering Systems | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.3390/diagnostics15182315 | - |
| dc.identifier.scopus | eid_2-s2.0-105017026789 | - |
| dc.identifier.volume | 15 | - |
| dc.identifier.issue | 18 | - |
| dc.identifier.eissn | 2075-4418 | - |
| dc.identifier.issnl | 2075-4418 | - |
