STAI CDT PhD Student Seminar Afternoon

5 March 2025

12:00 pm - 6:00 pm

We’re delighted to invite you to the UKRI Centre for Doctoral Training (CDT) in Safe and Trusted Artificial Intelligence (STAI) PhD Student Seminar Afternoon.

PhD students in the STAI CDT are hosted at King’s College London and Imperial College London. They look at how one can ensure that an AI system is safe, meaning we can have some guarantees about the correctness of its behaviour, and trusted, meaning the average user can have well-placed confidence in the system and its decision making. The focus is particularly on the use of model-based, or symbolic, AI techniques for ensuring safety and trustworthiness, since these provide an explicit language for analysing, reasoning and communicating about systems and their behaviours.

This STAI CDT PhD Student Seminar Afternoon is the third of four scheduled for this academic year. We hope you can join us to find out more about the work our PhD students are doing.

Programme (abstracts given below)

12:00 arrival and lunch

12:45 welcome

12:50 Aditi Ramaswamy – Defining and Quantifying Creative Behaviour in Popular Image Generators

13:15 Nathan Gavenski – State Transition Estimation and Adversarial Fine-Tuning of Policies for Generalisable Imitation Learning

13:40 Break

14:00 Maksim Anisimov – Policy-Aware Transition Models for Off-Policy Reinforcement Learning – Maksim Anisimov

14:25 Jack Contro – Detecting Manipulation in Language Models

14:35 Break

14:50 Shuying Ouyang – Knowledge-Enhanced Program Repair for Data Science Code

15:15 Break

15:45 Jared Swift – Planning for Exploration in Reinforcement Learning

16:10 Jess Lally – Robust Counterfactual Inference in Markov Decision Processes

16:35 close

17:00 reception and refreshments in the McAdam Terrace Café

Abstracts:

Defining and Quantifying Creative Behaviour in Popular Image Generators – Aditi Ramaswamy

Creativity is inherently difficult to define, let alone quantify, particularly when differentiating high-quality and creative output from complete randomness. Can an AI-based image generation model exhibit creativity akin to human artists, and if so, how does that manifest and what factors may influence it?

Our paper aims to empirically study creative behavior in popular image2image generation models by first defining and creating metrics to measure three core aspects of creative behavior, then conducting an experiment to quantify the effect of one factor, input temperature, on each of these metrics.

Such a ranking can be important for choosing the right model to accomplish a given goal.

State Transition Estimation and Adversarial Fine-Tuning of Policies for Generalisable Imitation Learning – Nathan Gavenski

State-of-the-art imitation learning from observation methods (ILfO) have made significant progress recently, but they still have some limitations: they need action-based supervised optimisation, assume that the transition functions are injective, and tend to act in behaviour-seeking mode. In this work, we propose Unsupervised Imitation Learning from Observation (UILfO), a novel two-stage ILfO technique that addresses all of these limitations. UILfO learns through a two-stage process in which first the agent learns an initial policy and an approximation of the environment’s transition function using the teacher’s state transitions and online play, and then a subsequent adversarial phase fine-tunes the policy learnt to align it further with the teacher’s behaviour. We conducted a number of experiments in five widely used environments. Our experiments show that UILfO not only outperforms all other ILfO methods whilst displaying the smallest standard deviation but also outperforms the teacher. We would argue that these results demonstrate a clear overall improvement in performance, as well as a better ability to generalise in unseen scenarios.

Policy-Aware Transition Models for Off-Policy Reinforcement Learning – Maksim Anisimov

Predicting events that will happen when a reinforcement learning (RL) agent is deployed to the real world is important to provide safety guarantees about its behaviour. In the case of off-policy RL, the agent’s training experience can be significantly different from the deployment experience. Thereby, learning a transition model with the unadjusted training data can lead to poor performance when predicting the agent’s behaviour under the optimal policy. To mitigate this issue, we propose a policy matching (PM) algorithm inspired by the causal Bayesian network factorisation. The method implies adjusting the transition model learning by taking into account the difference between agent’s interventions at training and under the optimal policy. Experiments in popular RL environments demonstrate that the PM method tends to improve the transition model performance while being more data-efficient than the naïve approach and requiring minimal assumptions.

Detecting Manipulation in Language Models – Jack Contro

The latest large language models (LLMs) have demonstrated almost human abilities to persuade. Because of this, there is a growing fear, codified in the EU AI Act, that these models will be able not to just persuade people, but also manipulate them. To counter this potential threat of manipulation in LLMs, I propose a dataset with simulated conversations between chatbots and users in which the chatbot tries to manipulate a user. These conversations are then annotated with manipulation types by humans. I show that the models that I have studied are capable of manipulative patterns, and that often they are manipulative even when not explicitly asked to be. I also provide some baseline NLP models to detect manipulation in conversations.

Planning for Exploration in Reinforcement Learning – Jared Swift

Exploration in Reinforcement Learning (RL) has been a well-researched topic since the inception of RL, with a plethora of methods proposed that aim to perform “good” exploration, which is often measured by “regret”; the difference between the return received and the optimal return. Whilst many of these methods provide sound theoretical guarantees, such as bounds on the regret, they are not applicable in-practice due to unrealistic assumptions made of the environment. In-practice, exploration strategies that use heuristics predominantly based on randomness are ubiquitous – whilst these are easy to implement and are domain independent, they often only offer theoretical guarantees in the limit and are inefficient in practice, due to the need for a large number of samples. The main aim of this project is to develop agents that utilise models and subsequently planning, as a heuristic for efficient, and ultimately intelligent, exploration.

Robust Counterfactual Inference in Markov Decision Processes – Jessica Lally

Reinforcement learning (RL) is increasingly being used to support human decision-making in real-world systems. Before deploying RL-learnt policies, we must verify their safety, particularly in safety-critical domains like healthcare. Counterfactual inference enables offline policy evaluation by predicting how an observed sequence of states and actions (under an existing policy) would have evolved under an alternative policy. However, existing counterfactual inference approaches for MDPs assume a fixed causal model of the underlying system, limiting the validity (and usefulness) of counterfactual inference. We relax these assumptions by computing exact bounds for the counterfactual probabilities across all causal models, leading to more reliable counterfactual analysis. Moreover, we prove closed-form expressions for these bounds, making computation highly efficient and scalable for handling MDPs.