File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Comparative Analysis of Chatbot Systems

TitleComparative Analysis of Chatbot Systems
Authors
Issue Date30-Sep-2025
PublisherIOS Press
Citation
Frontiers in Artificial Intelligence and Applications, 2025, v. 412 How to Cite?
Abstract

Existing research on chatbot evaluation suffers from inconsistent assessment standards, fragmented criteria, and insufficient coverage of critical dimensions like legal compliance and ethical alignment, which hinders reliable benchmarking of chatbots’ performance. Our study proposes a comprehensive framework for such evaluation and systematically compares five chatbot systems: Tidio (Rule-Based), GPT-4o (AI-Powered), Claude 3.5 Sonnet (LLM), Watson Assistant (Enterprise), and Qwen2.5-Max (Multilingual) in terms of their accuracy, safety, legal compliance, generalizability of performance, and ethical alignment. We conclude that while chatbots enhance efficiency in healthcare (97.34% patient education completeness) and e-commerce (30%–40% cost reduction), critical limitations persist. Recommendations include: (1) retrieval-augmented generation (RAG) for hallucination reduction, (2) ethical governance frameworks (e.g., AILuminate), and (3) domain-specialized tuning. Cross-sector collaboration and standardized evaluations are essential for responsible deployment of AI.


Persistent Identifierhttp://hdl.handle.net/10722/365937
ISSN
2023 SCImago Journal Rankings: 0.281

 

DC FieldValueLanguage
dc.contributor.authorXu, Hengsheng-
dc.contributor.authorWan, Linkun-
dc.contributor.authorLi, Yunyin-
dc.contributor.authorLiu, Jiaxi-
dc.contributor.authorLau, Adela S.M.-
dc.date.accessioned2025-11-12T00:36:38Z-
dc.date.available2025-11-12T00:36:38Z-
dc.date.issued2025-09-30-
dc.identifier.citationFrontiers in Artificial Intelligence and Applications, 2025, v. 412-
dc.identifier.issn0922-6389-
dc.identifier.urihttp://hdl.handle.net/10722/365937-
dc.description.abstract<p>Existing research on chatbot evaluation suffers from inconsistent assessment standards, fragmented criteria, and insufficient coverage of critical dimensions like legal compliance and ethical alignment, which hinders reliable benchmarking of chatbots’ performance. Our study proposes a comprehensive framework for such evaluation and systematically compares five chatbot systems: Tidio (Rule-Based), GPT-4o (AI-Powered), Claude 3.5 Sonnet (LLM), Watson Assistant (Enterprise), and Qwen2.5-Max (Multilingual) in terms of their accuracy, safety, legal compliance, generalizability of performance, and ethical alignment. We conclude that while chatbots enhance efficiency in healthcare (97.34% patient education completeness) and e-commerce (30%–40% cost reduction), critical limitations persist. Recommendations include: (1) retrieval-augmented generation (RAG) for hallucination reduction, (2) ethical governance frameworks (e.g., AILuminate), and (3) domain-specialized tuning. Cross-sector collaboration and standardized evaluations are essential for responsible deployment of AI.</p>-
dc.languageeng-
dc.publisherIOS Press-
dc.relation.ispartofFrontiers in Artificial Intelligence and Applications-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.titleComparative Analysis of Chatbot Systems-
dc.typeArticle-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.3233/FAIA250737-
dc.identifier.volume412-
dc.identifier.eissn1535-6698-
dc.identifier.issnl0922-6389-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats