File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/j.caeai.2024.100325
- Scopus: eid_2-s2.0-85207876069
- Find via
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Article: Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction
Title | Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction |
---|---|
Authors | |
Keywords | ChatGPT Classroom discourse analysis GPT-4o Mathematics instruction Professional development |
Issue Date | 1-Dec-2024 |
Publisher | Elsevier |
Citation | Computers and Education: Artificial Intelligence, 2024, v. 7 How to Cite? |
Abstract | High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices. |
Persistent Identifier | http://hdl.handle.net/10722/351330 |
ISSN | 2023 SCImago Journal Rankings: 3.227 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Xu, Simin | - |
dc.contributor.author | Huang, Xiaowei | - |
dc.contributor.author | Lo, Chung Kwan | - |
dc.contributor.author | Chen, Gaowei | - |
dc.contributor.author | Jong, Morris Siu yung | - |
dc.date.accessioned | 2024-11-20T00:38:41Z | - |
dc.date.available | 2024-11-20T00:38:41Z | - |
dc.date.issued | 2024-12-01 | - |
dc.identifier.citation | Computers and Education: Artificial Intelligence, 2024, v. 7 | - |
dc.identifier.issn | 2666-920X | - |
dc.identifier.uri | http://hdl.handle.net/10722/351330 | - |
dc.description.abstract | <p>High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices.</p> | - |
dc.language | eng | - |
dc.publisher | Elsevier | - |
dc.relation.ispartof | Computers and Education: Artificial Intelligence | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject | ChatGPT | - |
dc.subject | Classroom discourse analysis | - |
dc.subject | GPT-4o | - |
dc.subject | Mathematics instruction | - |
dc.subject | Professional development | - |
dc.title | Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction | - |
dc.type | Article | - |
dc.identifier.doi | 10.1016/j.caeai.2024.100325 | - |
dc.identifier.scopus | eid_2-s2.0-85207876069 | - |
dc.identifier.volume | 7 | - |
dc.identifier.eissn | 2666-920X | - |
dc.identifier.issnl | 2666-920X | - |