Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

Xu, Simin; Huang, Xiaowei; Lo, Chung Kwan; Chen, Gaowei; Jong, Morris Siu yung

File Download

There are no files associated with this item.

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1016/j.caeai.2024.100325
Scopus: eid_2-s2.0-85207876069
Find via

Supplementary

Citations:
- Scopus: 0
Appears in Collections:
- Faculty of Education: Journal/Magazine Articles

Article: Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

Title	Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction
Authors	Xu, Simin Huang, Xiaowei Lo, Chung Kwan Chen, Gaowei Jong, Morris Siu yung
Keywords	ChatGPT Classroom discourse analysis GPT-4o Mathematics instruction Professional development
Issue Date	1-Dec-2024
Publisher	Elsevier
Citation	Computers and Education: Artificial Intelligence, 2024, v. 7 How to Cite? DOI: http://dx.doi.org/10.1016/j.caeai.2024.100325
Abstract	High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.
Persistent Identifier	http://hdl.handle.net/10722/351330
ISSN	2666-920X 2023 SCImago Journal Rankings: 3.227

DC Field	Value	Language
dc.contributor.author	Xu, Simin	-
dc.contributor.author	Huang, Xiaowei	-
dc.contributor.author	Lo, Chung Kwan	-
dc.contributor.author	Chen, Gaowei	-
dc.contributor.author	Jong, Morris Siu yung	-
dc.date.accessioned	2024-11-20T00:38:41Z	-
dc.date.available	2024-11-20T00:38:41Z	-
dc.date.issued	2024-12-01	-
dc.identifier.citation	Computers and Education: Artificial Intelligence, 2024, v. 7	-
dc.identifier.issn	2666-920X	-
dc.identifier.uri	http://hdl.handle.net/10722/351330	-
dc.description.abstract	<p>High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.</p>	-
dc.language	eng	-
dc.publisher	Elsevier	-
dc.relation.ispartof	Computers and Education: Artificial Intelligence	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	ChatGPT	-
dc.subject	Classroom discourse analysis	-
dc.subject	GPT-4o	-
dc.subject	Mathematics instruction	-
dc.subject	Professional development	-
dc.title	Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction	-
dc.type	Article	-
dc.identifier.doi	10.1016/j.caeai.2024.100325	-
dc.identifier.scopus	eid_2-s2.0-85207876069	-
dc.identifier.volume	7	-
dc.identifier.eissn	2666-920X	-
dc.identifier.issnl	2666-920X	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats