File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1109/TLT.2025.3526582
- Scopus: eid_2-s2.0-85214515034
- WOS: WOS:001425521300001
Supplementary
- Citations:
- Appears in Collections:
Article: Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI
| Title | Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI |
|---|---|
| Authors | |
| Keywords | Data augmentation data sparsity generative artificial intelligence intelligent tutoring system learning performance data |
| Issue Date | 2025 |
| Citation | IEEE Transactions on Learning Technologies, 2025 How to Cite? |
| Abstract | Learning performance data, such as correct or incorrect answers and problem-solving attempts in Intelligent Tutoring Systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80% ∼ 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-Dimensional (3D) tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing tasks that predict missing performance values based on real observations. Third, data augmentation using Generative Artificial Intelligence (GenAI) models, including Generative Adversarial Network, specifically Vanilla Generative Adversarial Networks (GAN), and Generative Pretrained Transformers (GPT, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for Adult Reading Comprehension (ARC). We found that: (1) tensor factorization outperformed baseline knowledge tracing techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and 2) the Vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics. |
| Persistent Identifier | http://hdl.handle.net/10722/354419 |
| ISI Accession Number ID |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Zhang, Liang | - |
| dc.contributor.author | Lin, Jionghao | - |
| dc.contributor.author | Sabatini, John | - |
| dc.contributor.author | Borchers, Conrad | - |
| dc.contributor.author | Weitekamp, Daniel | - |
| dc.contributor.author | Cao, Meng | - |
| dc.contributor.author | Hollander, John | - |
| dc.contributor.author | Hu, Xiangen | - |
| dc.contributor.author | Graesser, Arthur C. | - |
| dc.date.accessioned | 2025-02-07T08:48:29Z | - |
| dc.date.available | 2025-02-07T08:48:29Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | IEEE Transactions on Learning Technologies, 2025 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/354419 | - |
| dc.description.abstract | Learning performance data, such as correct or incorrect answers and problem-solving attempts in Intelligent Tutoring Systems (ITSs), facilitate the assessment of knowledge mastery and the delivery of effective instructions. However, these data tend to be highly sparse (80% ∼ 90% missing observations) in most real-world applications. This data sparsity presents challenges to using learner models to effectively predict learners' future performance and explore new hypotheses about learning. This article proposes a systematic framework for augmenting learning performance data to address data sparsity. First, learning performance data can be represented as a 3-Dimensional (3D) tensor with dimensions corresponding to learners, questions, and attempts, effectively capturing longitudinal knowledge states during learning. Second, a tensor factorization method is used to impute missing values in sparse tensors of collected learner data, thereby grounding the imputation on knowledge tracing tasks that predict missing performance values based on real observations. Third, data augmentation using Generative Artificial Intelligence (GenAI) models, including Generative Adversarial Network, specifically Vanilla Generative Adversarial Networks (GAN), and Generative Pretrained Transformers (GPT, specifically GPT-4o), generate data tailored to individual clusters of learning performance. We tested this systemic framework on adult literacy datasets from AutoTutor lessons developed for Adult Reading Comprehension (ARC). We found that: (1) tensor factorization outperformed baseline knowledge tracing techniques in tracing and predicting learning performance, demonstrating higher fidelity in data imputation, and 2) the Vanilla GAN-based augmentation demonstrated greater overall stability across varying sample sizes, whereas GPT-4o based augmentation exhibited higher variability, with occasional cases showing closer fidelity to the original data distribution. This framework facilitates the effective augmentation of learning performance data, enabling controlled, cost-effective approach for the evaluation and optimization of ITS instructional designs in both online and offline environments prior to deployment, and supporting advanced educational data mining and learning analytics. | - |
| dc.language | eng | - |
| dc.relation.ispartof | IEEE Transactions on Learning Technologies | - |
| dc.subject | Data augmentation | - |
| dc.subject | data sparsity | - |
| dc.subject | generative artificial intelligence | - |
| dc.subject | intelligent tutoring system | - |
| dc.subject | learning performance data | - |
| dc.title | Data Augmentation for Sparse Multidimensional Learning Performance Data Using Generative AI | - |
| dc.type | Article | - |
| dc.description.nature | link_to_subscribed_fulltext | - |
| dc.identifier.doi | 10.1109/TLT.2025.3526582 | - |
| dc.identifier.scopus | eid_2-s2.0-85214515034 | - |
| dc.identifier.eissn | 1939-1382 | - |
| dc.identifier.isi | WOS:001425521300001 | - |
