Software, Benchmarks and Datasets

We present here some of the software, benchmarks and datasets by students from the UKRI Centre for Doctoral Training in Safe and Trusted Artificial Intelligence (grant reference number EP/S023356/1).   

Argumentative LLMs: augments large language models with a formal reasoning layer based on computational argumentation; enables generation of structured and faithfulexplanations of their reasoning, while also allowing users to challenge and correct any identified issues.

DSRepair: a knowledge-enhanced program repair approach designed to repair the buggy code generated by LLMs in the data science domain; uses knowledge graph based RAG for API knowledge retrieval and bug knowledge enrichment to construct repair prompts for LLMs.

GLlama Alarm: a suite of knowledge-guided versions of Llama 2 instruction fine-tuned for non-binary abusive language detection and explanation generation tasks.

Neuro-AL: Neural Argumentative Learning (NAL), an architecture that integrates Assumption-Based Argumentation (ABA) with deep learning for image analysis.

Turbulence-Benchmark – Version 1.0: an innovative benchmark designed to systematically evaluate the correctness and robustness of instruction-tuned large language models (LLMs) for code generation.