File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

TitleEvaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction
Authors
KeywordsChatGPT
Classroom discourse analysis
GPT-4o
Mathematics instruction
Professional development
Issue Date1-Dec-2024
PublisherElsevier
Citation
Computers and Education: Artificial Intelligence, 2024, v. 7 How to Cite?
Abstract

High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.


Persistent Identifierhttp://hdl.handle.net/10722/351330
ISSN
2023 SCImago Journal Rankings: 3.227

 

DC FieldValueLanguage
dc.contributor.authorXu, Simin-
dc.contributor.authorHuang, Xiaowei-
dc.contributor.authorLo, Chung Kwan-
dc.contributor.authorChen, Gaowei-
dc.contributor.authorJong, Morris Siu yung-
dc.date.accessioned2024-11-20T00:38:41Z-
dc.date.available2024-11-20T00:38:41Z-
dc.date.issued2024-12-01-
dc.identifier.citationComputers and Education: Artificial Intelligence, 2024, v. 7-
dc.identifier.issn2666-920X-
dc.identifier.urihttp://hdl.handle.net/10722/351330-
dc.description.abstract<p>High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.</p>-
dc.languageeng-
dc.publisherElsevier-
dc.relation.ispartofComputers and Education: Artificial Intelligence-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subjectChatGPT-
dc.subjectClassroom discourse analysis-
dc.subjectGPT-4o-
dc.subjectMathematics instruction-
dc.subjectProfessional development-
dc.titleEvaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction-
dc.typeArticle-
dc.identifier.doi10.1016/j.caeai.2024.100325-
dc.identifier.scopuseid_2-s2.0-85207876069-
dc.identifier.volume7-
dc.identifier.eissn2666-920X-
dc.identifier.issnl2666-920X-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats