File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Enhancing efficiency, correctness, and social fairness in automated code generation
| Title | Enhancing efficiency, correctness, and social fairness in automated code generation |
|---|---|
| Authors | |
| Advisors | Advisor(s):Cui, H |
| Issue Date | 2025 |
| Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
| Citation | Huang, D. [黄东]. (2025). Enhancing efficiency, correctness, and social fairness in automated code generation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
| Abstract | Large Language Models (LLMs) are increasingly integrated into IDEs to assist with software development tasks such as code generation, debugging, and testing. LLMs have significantly enhanced developer productivity by generating code from natural language instructions. However, despite these advancements, LLM-generated code often suffers from critical shortcomings: functional incorrectness, poor efficiency, and social biases. These limitations hinder the practical deployment of LLMs in real-world software engineering, particularly in performance-critical and socially sensitive contexts.
Functional incorrectness in LLM-generated code requires extensive manual intervention to debug and repair, slowing down software development workflows. Poor efficiency leads to increased execution time and resource consumption, rendering the code impractical for use in resource-constrained environments such as embedded systems or mobile devices. Inefficiency also exacerbates energy consumption, a growing concern for sustainable software engineering. Meanwhile, biases embedded in LLM-generated code can perpetuate inequities in critical applications, such as hiring algorithms or healthcare systems, limiting societal applicability. Addressing these challenges is essential to unlock the full potential of LLMs in software development.
This thesis proposes a comprehensive framework to address these challenges, presenting three key contributions that focus on improving the efficiency, correctness, and social fairness of LLM-generated code. First, we propose EffiBench and EffiLearner to address the inefficiency of LLM-generated code. EffiBench introduces the first benchmark specifically designed to measure efficiency, incorporating a collection of 1,000 efficiency-critical problems paired with canonical solutions optimized for time and space complexity. It integrates comprehensive test cases and diverse metrics, such as execution time and memory usage, to evaluate the efficiency of LLM-generated code. Building on this foundation, EffiLearner leverages the insights from EffiBench to introduce a self-optimization framework inspired by human coding practices. EffiLearner refines LLM-generated code iteratively using execution profiles that reveal computational overheads, enabling LLMs to reduce execution time and memory usage while improving overall efficiency.
Second, to simultaneously improve correctness and efficiency, we introduce EffiCoder, a fine-tuning dataset and framework that extends existing efforts. EffiCoder aggregates optimized solutions from multiple datasets and generates rich metadata and test cases to evaluate execution performance. By incorporating iterative self-optimization into the dataset construction process, EffiCoder enables LLMs to produce correct and high-performing code that balances functional requirements and computational efficiency. This framework bridges the gap left by previous fine-tuning approaches, which often focused exclusively on correctness.
Finally, to address social fairness, we propose the Code Bias Score (CBS) framework for evaluating and mitigating biases in LLM-generated code for bias-sensitive tasks. CBS employs automated test generation and Abstract Syntax Tree analysis to detect and quantify bias behaviors in generated code. In addition to evaluating fairness, CBS provides feedback to LLMs, guiding them to reduce biases during code generation. This approach ensures that LLMs produce code that adheres to ethical and equitable standards without sacrificing performance.
These contributions provide a unified framework for addressing the core limitations of LLM-generated code. By ensuring efficiency, correctness, and social fairness, this thesis paves the way for the broader adoption of LLMs in real-world software engineering, fostering sustainable, reliable, and socially responsible practices. |
| Degree | Doctor of Philosophy |
| Subject | Code generators Automatic programming (Computer science) |
| Dept/Program | Computer Science |
| Persistent Identifier | http://hdl.handle.net/10722/356592 |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Cui, H | - |
| dc.contributor.author | Huang, Dong | - |
| dc.contributor.author | 黄东 | - |
| dc.date.accessioned | 2025-06-05T09:31:19Z | - |
| dc.date.available | 2025-06-05T09:31:19Z | - |
| dc.date.issued | 2025 | - |
| dc.identifier.citation | Huang, D. [黄东]. (2025). Enhancing efficiency, correctness, and social fairness in automated code generation. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
| dc.identifier.uri | http://hdl.handle.net/10722/356592 | - |
| dc.description.abstract | Large Language Models (LLMs) are increasingly integrated into IDEs to assist with software development tasks such as code generation, debugging, and testing. LLMs have significantly enhanced developer productivity by generating code from natural language instructions. However, despite these advancements, LLM-generated code often suffers from critical shortcomings: functional incorrectness, poor efficiency, and social biases. These limitations hinder the practical deployment of LLMs in real-world software engineering, particularly in performance-critical and socially sensitive contexts. Functional incorrectness in LLM-generated code requires extensive manual intervention to debug and repair, slowing down software development workflows. Poor efficiency leads to increased execution time and resource consumption, rendering the code impractical for use in resource-constrained environments such as embedded systems or mobile devices. Inefficiency also exacerbates energy consumption, a growing concern for sustainable software engineering. Meanwhile, biases embedded in LLM-generated code can perpetuate inequities in critical applications, such as hiring algorithms or healthcare systems, limiting societal applicability. Addressing these challenges is essential to unlock the full potential of LLMs in software development. This thesis proposes a comprehensive framework to address these challenges, presenting three key contributions that focus on improving the efficiency, correctness, and social fairness of LLM-generated code. First, we propose EffiBench and EffiLearner to address the inefficiency of LLM-generated code. EffiBench introduces the first benchmark specifically designed to measure efficiency, incorporating a collection of 1,000 efficiency-critical problems paired with canonical solutions optimized for time and space complexity. It integrates comprehensive test cases and diverse metrics, such as execution time and memory usage, to evaluate the efficiency of LLM-generated code. Building on this foundation, EffiLearner leverages the insights from EffiBench to introduce a self-optimization framework inspired by human coding practices. EffiLearner refines LLM-generated code iteratively using execution profiles that reveal computational overheads, enabling LLMs to reduce execution time and memory usage while improving overall efficiency. Second, to simultaneously improve correctness and efficiency, we introduce EffiCoder, a fine-tuning dataset and framework that extends existing efforts. EffiCoder aggregates optimized solutions from multiple datasets and generates rich metadata and test cases to evaluate execution performance. By incorporating iterative self-optimization into the dataset construction process, EffiCoder enables LLMs to produce correct and high-performing code that balances functional requirements and computational efficiency. This framework bridges the gap left by previous fine-tuning approaches, which often focused exclusively on correctness. Finally, to address social fairness, we propose the Code Bias Score (CBS) framework for evaluating and mitigating biases in LLM-generated code for bias-sensitive tasks. CBS employs automated test generation and Abstract Syntax Tree analysis to detect and quantify bias behaviors in generated code. In addition to evaluating fairness, CBS provides feedback to LLMs, guiding them to reduce biases during code generation. This approach ensures that LLMs produce code that adheres to ethical and equitable standards without sacrificing performance. These contributions provide a unified framework for addressing the core limitations of LLM-generated code. By ensuring efficiency, correctness, and social fairness, this thesis paves the way for the broader adoption of LLMs in real-world software engineering, fostering sustainable, reliable, and socially responsible practices. | - |
| dc.language | eng | - |
| dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
| dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
| dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
| dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
| dc.subject.lcsh | Code generators | - |
| dc.subject.lcsh | Automatic programming (Computer science) | - |
| dc.title | Enhancing efficiency, correctness, and social fairness in automated code generation | - |
| dc.type | PG_Thesis | - |
| dc.description.thesisname | Doctor of Philosophy | - |
| dc.description.thesislevel | Doctoral | - |
| dc.description.thesisdiscipline | Computer Science | - |
| dc.description.nature | published_or_final_version | - |
| dc.date.hkucongregation | 2025 | - |
| dc.identifier.mmsid | 991044970874303414 | - |
