File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Integrative modeling enables ChatGPT to achieve average level of human counselors performance in mental health Q&A

TitleIntegrative modeling enables ChatGPT to achieve average level of human counselors performance in mental health Q&A
Authors
KeywordsChatGPT
Integrative modeling
Large language model
LLMs evaluation
Mental health Q&A
Prompt engineering
Issue Date1-Sep-2025
PublisherElsevier
Citation
Information Processing and Management, 2025, v. 62, n. 5 How to Cite?
Abstract

Recent advancements in generative artificial intelligence (GenAI), particularly ChatGPT, have demonstrated significant potential in addressing the persistent treatment gap in mental health care. Systematic evaluation of ChatGPT's capabilities in addressing mental health questions is essential for its large-scale application. The current study introduces a computational evaluation framework centered on perceived information quality (PIQ) to quantitatively assess ChatGPT's capabilities. Leveraging datasets of question-answer pairs generated by both humans and ChatGPT, the framework integrates predictive modeling, explainable modeling, and prompt-engineering-based validation to identify intrinsic evaluation metrics and enable automated assessments. Results revealed that unprompted ChatGPT's PIQ is significantly lower than that of human counselors overall, with notable deficiencies such as insufficient conversational length, lower text diversity, and reduced professionalism. Despite not matching the top 25% of human counselors, our evaluation framework improved ChatGPT's mean PIQ by 8.91% to 11.67% across four risk levels. Prompted ChatGPT performed comparably to human counselors in severe (p = 0.0561) and moderate-risk questions (p = 0.7851), and significantly outperformed them in low- and no-risk categories by 6.80% and 4.63%, respectively (p < 0.001). However, undesirable verbal behaviors still persist in text diversity and professionalism. These findings validate ChatGPT's capabilities to address mental health questions while cautioning that further researches are necessary for LLM-based mental health systems to deliver services comparable to human experts.


Persistent Identifierhttp://hdl.handle.net/10722/358396
ISSN
2023 Impact Factor: 7.4
2023 SCImago Journal Rankings: 2.134

 

DC FieldValueLanguage
dc.contributor.authorHuang, Yinghui-
dc.contributor.authorWang, Weijun-
dc.contributor.authorZhou, Jinyi-
dc.contributor.authorZhang, Liang-
dc.contributor.authorLin, Jionghao-
dc.contributor.authorLiu, Hui-
dc.contributor.authorHu, Xiangen-
dc.contributor.authorZhou, Zongkui-
dc.contributor.authorDong, Wanghao-
dc.date.accessioned2025-08-07T00:31:58Z-
dc.date.available2025-08-07T00:31:58Z-
dc.date.issued2025-09-01-
dc.identifier.citationInformation Processing and Management, 2025, v. 62, n. 5-
dc.identifier.issn0306-4573-
dc.identifier.urihttp://hdl.handle.net/10722/358396-
dc.description.abstract<p>Recent advancements in generative artificial intelligence (GenAI), particularly ChatGPT, have demonstrated significant potential in addressing the persistent treatment gap in mental health care. Systematic evaluation of ChatGPT's capabilities in addressing mental health questions is essential for its large-scale application. The current study introduces a computational evaluation framework centered on perceived information quality (PIQ) to quantitatively assess ChatGPT's capabilities. Leveraging datasets of question-answer pairs generated by both humans and ChatGPT, the framework integrates predictive modeling, explainable modeling, and prompt-engineering-based validation to identify intrinsic evaluation metrics and enable automated assessments. Results revealed that unprompted ChatGPT's PIQ is significantly lower than that of human counselors overall, with notable deficiencies such as insufficient conversational length, lower text diversity, and reduced professionalism. Despite not matching the top 25% of human counselors, our evaluation framework improved ChatGPT's mean PIQ by 8.91% to 11.67% across four risk levels. Prompted ChatGPT performed comparably to human counselors in severe (p = 0.0561) and moderate-risk questions (p = 0.7851), and significantly outperformed them in low- and no-risk categories by 6.80% and 4.63%, respectively (p < 0.001). However, undesirable verbal behaviors still persist in text diversity and professionalism. These findings validate ChatGPT's capabilities to address mental health questions while cautioning that further researches are necessary for LLM-based mental health systems to deliver services comparable to human experts.</p>-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofInformation Processing and Management-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectChatGPT-
dc.subjectIntegrative modeling-
dc.subjectLarge language model-
dc.subjectLLMs evaluation-
dc.subjectMental health Q&A-
dc.subjectPrompt engineering-
dc.titleIntegrative modeling enables ChatGPT to achieve average level of human counselors performance in mental health Q&A -
dc.typeArticle-
dc.identifier.doi10.1016/j.ipm.2025.104152-
dc.identifier.scopuseid_2-s2.0-105001798919-
dc.identifier.volume62-
dc.identifier.issue5-
dc.identifier.eissn1873-5371-
dc.identifier.issnl0306-4573-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats