File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Conference Paper: CAN LLMS SOLVE LONGER MATH WORD PROBLEMS BETTER?

TitleCAN LLMS SOLVE LONGER MATH WORD PROBLEMS BETTER?
Authors
Issue Date2025
Citation
13th International Conference on Learning Representations Iclr 2025, 2025, p. 24767-24798 How to Cite?
AbstractMath Word Problems (MWPs) play a vital role in assessing the capabilities of Large Language Models (LLMs), yet current research primarily focuses on questions with concise contexts. The impact of longer contexts on mathematical reasoning remains under-explored. This study pioneers the investigation of Context Length Generalizability (CoLeG), which refers to the ability of LLMs to solve MWPs with extended narratives. We introduce Extended Grade-School Math (E-GSM), a collection of MWPs featuring lengthy narratives, and propose two novel metrics to evaluate the efficacy and resilience of LLMs in tackling these problems. Our analysis of existing zero-shot prompting techniques with proprietary LLMs along with open-source LLMs reveals a general deficiency in CoLeG. To alleviate these issues, we propose tailored approaches for different categories of LLMs. For proprietary LLMs, we introduce a new instructional prompt designed to mitigate the impact of long contexts. For open-source LLMs, we develop a novel auxiliary task for fine-tuning to enhance CoLeG. Our comprehensive results demonstrate the effectiveness of our proposed methods, showing improved performance on E-GSM. Additionally, we conduct an in-depth analysis to differentiate the effects of semantic understanding and reasoning efficacy, showing that our methods improves the latter. We also establish the generalizability of our methods across several other MWP benchmarks. Our findings highlight the limitations of current LLMs and offer practical solutions correspondingly, paving the way for further exploration of model generalizability and training methodologies.
Persistent Identifierhttp://hdl.handle.net/10722/363045

 

DC FieldValueLanguage
dc.contributor.authorXu, Xin-
dc.contributor.authorXiao, Tong-
dc.contributor.authorChao, Zitong-
dc.contributor.authorHuang, Zhenya-
dc.contributor.authorYang, Can-
dc.contributor.authorWang, Yang-
dc.date.accessioned2025-10-10T07:44:14Z-
dc.date.available2025-10-10T07:44:14Z-
dc.date.issued2025-
dc.identifier.citation13th International Conference on Learning Representations Iclr 2025, 2025, p. 24767-24798-
dc.identifier.urihttp://hdl.handle.net/10722/363045-
dc.description.abstractMath Word Problems (MWPs) play a vital role in assessing the capabilities of Large Language Models (LLMs), yet current research primarily focuses on questions with concise contexts. The impact of longer contexts on mathematical reasoning remains under-explored. This study pioneers the investigation of Context Length Generalizability (CoLeG), which refers to the ability of LLMs to solve MWPs with extended narratives. We introduce Extended Grade-School Math (E-GSM), a collection of MWPs featuring lengthy narratives, and propose two novel metrics to evaluate the efficacy and resilience of LLMs in tackling these problems. Our analysis of existing zero-shot prompting techniques with proprietary LLMs along with open-source LLMs reveals a general deficiency in CoLeG. To alleviate these issues, we propose tailored approaches for different categories of LLMs. For proprietary LLMs, we introduce a new instructional prompt designed to mitigate the impact of long contexts. For open-source LLMs, we develop a novel auxiliary task for fine-tuning to enhance CoLeG. Our comprehensive results demonstrate the effectiveness of our proposed methods, showing improved performance on E-GSM. Additionally, we conduct an in-depth analysis to differentiate the effects of semantic understanding and reasoning efficacy, showing that our methods improves the latter. We also establish the generalizability of our methods across several other MWP benchmarks. Our findings highlight the limitations of current LLMs and offer practical solutions correspondingly, paving the way for further exploration of model generalizability and training methodologies.-
dc.languageeng-
dc.relation.ispartof13th International Conference on Learning Representations Iclr 2025-
dc.titleCAN LLMS SOLVE LONGER MATH WORD PROBLEMS BETTER?-
dc.typeConference_Paper-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.scopuseid_2-s2.0-105010206593-
dc.identifier.spage24767-
dc.identifier.epage24798-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats