From Code Generation to Knowledge-Enhanced Program Repair: Shuyin Ouyang’s Research Makes an Impact

3rd October 2025 | News, Student News

News > From Code Generation to Knowledge-Enhanced Program Repair: Shuyin Ouyang’s Research Makes an Impact

STAI CDT PhD student, Shuyin Ouyang has recently co-authored two research papers that were accepted for publication at leading conferences.

Shedding Light on the Non-Determinism of ChatGPT

Shuyin was a co-author on the paper “Empirical Study of the Non-Determinism of ChatGPT in Code Generation”, which was accepted for publication in the journal, ACM Transactions on Software Engineering and Methodology (TOTEM) and also resubmitted to the ACM International Conference on the Foundations of Software Engineering (FSE 2025) journal track.

Recently, there has been a surge of research into using Large Language Models (LLMs) for software engineering, especially for code generation. However, these models can be unpredictable, often producing very different code from the same prompt. This is referred to as non-determinism. This inconsistency makes it harder to ensure the code is correct and reliable, reduces developers’ trust in LLMs, and creates challenges for reproducing results in research.

So far, no research has examined how serious the problem of non-determinism really is. To address this, Shuyin and his co-authors carried out the first empirical study on non-determinism in code generation with ChatGPT. Using three well-known code generation benchmarks, they have found that the correctness, test results, and even the syntax and structure of code produced from the same instruction can vary greatly across different requests. They hope this research raises awareness that this unpredictability is a real issue for anyone relying on AI for coding. Their research has had over 300+ citations so far.

Shuyin says “Like Forest Gump said, ‘Life is like a box of chocolates; you never know what you’re gonna get.’ LLMs share this unpredictability: the code they generate is inherently non-deterministic. For researchers, this means we must carefully account for randomness to ensure experiments are reproducible and results are reliable. For developers, it means rigorously testing and validating generated code before deploying it to production. Awareness of this non-determinism is the first step toward building both trustworthy research and robust real-world systems.”

Advancing Program Repair with Knowledge-Enhanced Approaches

Shuyin also co-authored the paper, ‘Knowledge-Enhanced Program Repair for Data Science Code’ which was accepted in the proceedings of the 47th Conference on Software Engineering (ICSE 2025).

This paper introduces DSrepair, a knowledge-enhanced program repair approach designed to repair the buggy code generated by LLMs in the data science domain.

Buggy code is when there is an error in the code which stops it from working as it should. Shuyin and his co-authors performed experiments with four LLMs and five baselines in data science code repair and found that DSrepair significantly outperforms all the baselines in repairing data science code. By integrating API knowledge retrieval and bug information enrichment, they can guarantee better performance in code repair, and gain people’s trust in using LLMs for coding.

As Shuyin explains, “LLMs are emerging as powerful tools for automatically generating data science code. However, their limitations, such as hallucinations, limited domain expertise, and a lack of context awareness, pose serious challenges for reliable code generation. In our paper, we introduce the first knowledge graph–based retrieval-augmented generation (RAG) approach for data science code repair. By injecting external knowledge from API documentation into LLMs, our method enables more accurate, context-aware, and reliable data science code generation. Bridging LLMs with structured knowledge is a crucial step toward building safer, more transparent, and trustworthy AI-driven code generation”.

Explore the research

Ouyang, S, Zhang, J, Harman, M & Wang, M (2024). An Empirical Study of the Non-determinism of ChatGPT in Code Generation. ACM Transactions on Software Engineering and Methodology (TOSEM).

Ouyang, S., Zhang, J., Sun, Z. & Merono Penuela, A. (2025). Knowledge-Enhanced Program Repair for Data Science Code. In Proceedings of the 47th International Conference on Software Engineering (ICSE 2025).