File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/j.ipm.2025.104152
- Scopus: eid_2-s2.0-105001798919
- Find via

Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Integrative modeling enables ChatGPT to achieve average level of human counselors performance in mental health Q&A
| Title | Integrative modeling enables ChatGPT to achieve average level of human counselors performance in mental health Q&A |
|---|---|
| Authors | |
| Keywords | ChatGPT Integrative modeling Large language model LLMs evaluation Mental health Q&A Prompt engineering |
| Issue Date | 1-Sep-2025 |
| Publisher | Elsevier |
| Citation | Information Processing and Management, 2025, v. 62, n. 5 How to Cite? |
| Abstract | Recent advancements in generative artificial intelligence (GenAI), particularly ChatGPT, have demonstrated significant potential in addressing the persistent treatment gap in mental health care. Systematic evaluation of ChatGPT's capabilities in addressing mental health questions is essential for its large-scale application. The current study introduces a computational evaluation framework centered on perceived information quality (PIQ) to quantitatively assess ChatGPT's capabilities. Leveraging datasets of question-answer pairs generated by both humans and ChatGPT, the framework integrates predictive modeling, explainable modeling, and prompt-engineering-based validation to identify intrinsic evaluation metrics and enable automated assessments. Results revealed that unprompted ChatGPT's PIQ is significantly lower than that of human counselors overall, with notable deficiencies such as insufficient conversational length, lower text diversity, and reduced professionalism. Despite not matching the top 25% of human counselors, our evaluation framework improved ChatGPT's mean PIQ by 8.91% to 11.67% across four risk levels. Prompted ChatGPT performed comparably to human counselors in severe (p = 0.0561) and moderate-risk questions (p = 0.7851), and significantly outperformed them in low- and no-risk categories by 6.80% and 4.63%, respectively (p < 0.001). However, undesirable verbal behaviors still persist in text diversity and professionalism. These findings validate ChatGPT's capabilities to address mental health questions while cautioning that further researches are necessary for LLM-based mental health systems to deliver services comparable to human experts. |
| Persistent Identifier | http://hdl.handle.net/10722/358396 |
| ISSN | 2023 Impact Factor: 7.4 2023 SCImago Journal Rankings: 2.134 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Huang, Yinghui | - |
| dc.contributor.author | Wang, Weijun | - |
| dc.contributor.author | Zhou, Jinyi | - |
| dc.contributor.author | Zhang, Liang | - |
| dc.contributor.author | Lin, Jionghao | - |
| dc.contributor.author | Liu, Hui | - |
| dc.contributor.author | Hu, Xiangen | - |
| dc.contributor.author | Zhou, Zongkui | - |
| dc.contributor.author | Dong, Wanghao | - |
| dc.date.accessioned | 2025-08-07T00:31:58Z | - |
| dc.date.available | 2025-08-07T00:31:58Z | - |
| dc.date.issued | 2025-09-01 | - |
| dc.identifier.citation | Information Processing and Management, 2025, v. 62, n. 5 | - |
| dc.identifier.issn | 0306-4573 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/358396 | - |
| dc.description.abstract | <p>Recent advancements in generative artificial intelligence (GenAI), particularly ChatGPT, have demonstrated significant potential in addressing the persistent treatment gap in mental health care. Systematic evaluation of ChatGPT's capabilities in addressing mental health questions is essential for its large-scale application. The current study introduces a computational evaluation framework centered on perceived information quality (PIQ) to quantitatively assess ChatGPT's capabilities. Leveraging datasets of question-answer pairs generated by both humans and ChatGPT, the framework integrates predictive modeling, explainable modeling, and prompt-engineering-based validation to identify intrinsic evaluation metrics and enable automated assessments. Results revealed that unprompted ChatGPT's PIQ is significantly lower than that of human counselors overall, with notable deficiencies such as insufficient conversational length, lower text diversity, and reduced professionalism. Despite not matching the top 25% of human counselors, our evaluation framework improved ChatGPT's mean PIQ by 8.91% to 11.67% across four risk levels. Prompted ChatGPT performed comparably to human counselors in severe (p = 0.0561) and moderate-risk questions (p = 0.7851), and significantly outperformed them in low- and no-risk categories by 6.80% and 4.63%, respectively (p < 0.001). However, undesirable verbal behaviors still persist in text diversity and professionalism. These findings validate ChatGPT's capabilities to address mental health questions while cautioning that further researches are necessary for LLM-based mental health systems to deliver services comparable to human experts.</p> | - |
| dc.language | eng | - |
| dc.publisher | Elsevier | - |
| dc.relation.ispartof | Information Processing and Management | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject | ChatGPT | - |
| dc.subject | Integrative modeling | - |
| dc.subject | Large language model | - |
| dc.subject | LLMs evaluation | - |
| dc.subject | Mental health Q&A | - |
| dc.subject | Prompt engineering | - |
| dc.title | Integrative modeling enables ChatGPT to achieve average level of human counselors performance in mental health Q&A | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1016/j.ipm.2025.104152 | - |
| dc.identifier.scopus | eid_2-s2.0-105001798919 | - |
| dc.identifier.volume | 62 | - |
| dc.identifier.issue | 5 | - |
| dc.identifier.eissn | 1873-5371 | - |
| dc.identifier.issnl | 0306-4573 | - |
