STAI CDT PhD Student Seminar Afternoon

22 January 2025

12:00 pm - 6:00 pm

We’re delighted to invite you to the UKRI Centre for Doctoral Training (CDT) in Safe and Trusted Artificial Intelligence (STAI) PhD Student Seminar Afternoon.

PhD students in the STAI CDT are hosted at King’s College London and Imperial College London. They look at how one can ensure that an AI system is safe, meaning we can have some guarantees about the correctness of its behaviour, and trusted, meaning the average user can have well-placed confidence in the system and its decision making. The focus is particularly on the use of model-based, or symbolic, AI techniques for ensuring safety and trustworthiness, since these provide an explicit language for analysing, reasoning and communicating about systems and their behaviours.

This STAI CDT PhD Student Seminar Afternoon is the second of four scheduled for this academic year. We hope you can join us to find out more about the work our PhD students are doing.

We intend to also broadcast the event over Teams; if you prefer to attend remotely, please indicate this in your registration (and note that, while we will do our best to support this option, we may have to abandon it last minute if technical difficulties arise).

Programme (abstracts given below)

12:00 Arrival and lunch

12:30 Welcome

12:35 Sam Goring: A pragmatic approach to uncertainty quantification

13:00 Usman Islam: Reachability-based Hierarchical Planning for Goal-conditioned Reinforcement Learning 

13:25 Michelle Nwachukwu

13:50 Break

14:10 Gabriele La Malfa: The source of unfairness in multi-agent systems

14:35 Zoe Evans: Identifying and Correcting Unfairness in Reinforcement Learning for Robotics

15:00 Break

15:20 Stefan Roesch: Using Partner Choice to Encourage Cooperation in Social Dilemmas

15:45 Roko Parac: Learning Robust Reward Machines from Noisy Labels

16:10 Andrei Balcau: Detecting Collective Liquidity Taking Distributions

16:35 Close, reception and refreshments

Abstracts:

A pragmatic approach to uncertainty quantification – Sam Goring

Uncertainty quantification is essential for reliable reasoning in probabilistic settings. Yet recent works have challenged a longstanding consensus around information theoretic approaches to formalising the measure and decomposition of uncertainty in classification and regression tasks. We argue that these issues arise from a fundamental problem misspecification. The meaning and utility of uncertainty depends on an agent’s objective in subtle ways that can lead to apparent contradictions. Our research explores a pragmatic solution in which agents select axioms to ensure a measure that is invariant to just those transformations that are relevant for a given task.

Reachability-based Hierarchical Planning for Goal-conditioned Reinforcement Learning  – Usman Islam

Hierarchically structured policies have proven a promising tool for improving the flexibility and efficiency of goal-conditioned reinforcement learning (RL). However, more work is needed to allow RL to solve realistic tasks efficiently. A key issue is that low-level policies cannot easily be transferred from task to task and thus must be retrained even if the difference in environment is small. Existing methods elect to train high- and low-level policies together in order to meet the flexibility requirements for solving complex environments. Trained low-level policies are therefore specialised to a given environment. We present a novel hierarchical training regime which allows low-level policies to be trained once and used across a wide range of complex tasks without finetuning. Our main insight is that in order for a policy to be transferable, the higher level must have an idea of its performance (in order to plan around it), i.e. which points in the goal space it can reach. Our method models the reachability of pre-trained low-level policies and uses this model alongside a graph-based high-level planner to solve complex RL environments while transferring low-level policies as black-boxes. We present the method along with some preliminary results demonstrating the idea on challenging robotic navigation tasks.

The source of unfairness in multi-agent systems – Gabriele La Malfa

Enhancing fairness in MAS through its integration into agents’ policy has been extensively explored in the literature. Although such a research direction is central to fairness, attributing it solely to policy is insufficient to measure and mitigate unfairness. We identify two additional factors: the MAS rules, which shape the interaction dynamics, and the environment configurations, which determine the physical and temporal conditions of the MAS. Both factors can introduce unfairness if they favour specific individuals or groups. We structure a discussion around the proposal of a more comprehensive theory of fairness in MAS.

Identifying and Correcting Unfairness in Reinforcement Learning for Robotics – Zoe Evans

Bias has been shown to be a pervasive problem in machine learning, with severe and unanticipated consequences, for example in the form of algorithm performance disparities across social groups. This talk will cover our initial attempts to investigate and characterise how similar issues may arise in Reinforcement Learning (RL) for Human–Robot Interaction (HRI), with the intent of averting the same problems. This talk will cover two case studies that highlight the risk of representation bias occurring in RL, as well as potential technical and social solutions to alleviate this bias

Using Partner Choice to Encourage Cooperation in Social Dilemmas – Stefan Roesch

Mixed-motive games can be considered as formal models of real-world decision problems. Mixed-motive games differ from zero-sum (where one player’s positive utility gain is equal to their opponents’ losses) and cooperative (where all players share the same utility function) settings in that each player’s utility function may be unique. This individualised signal of utility can lead to particularly prickly social interaction dynamics which cultivate pathologic behavioural phenomena ranging from sub-optimal multi-agent collaboration to the development of self-interested, and socially harmful, behavioural patterns. In this talk, we focus on a particular subset of mixed-motive games, termed Social Dilemmas and discuss the use of a partner choice method in the amelioration of the development of purely self-interested agents in such settings.

Learning Robust Reward Machines from Noisy Labels – Roko Parac

This talk will cover PROB-IRM, an approach that learns robust reward machines (RMs) for reinforcement learning RL) agents from noisy execution traces. The key aspect of RM-driven RL is the exploitation of a finite-state machine that decomposes the agent’s task into different sub-tasks. PROB-IRM uses a state-of-the-art inductive logic programming framework robust to noisy examples to learn RMs from noisy traces using the Bayesian posterior degree of beliefs, thus ensuring robustness against inconsistencies. Pivotal for the results is the interleaving between RM learning and policy learning: a new RM is learned whenever the RL agent generates a trace that is believed not to be accepted by the current RM. To speed up the training of the RL agent, PROB-IRM employs a probabilistic formulation of reward shaping that uses the posterior Bayesian beliefs derived from the traces. Our experimental analysis shows that PROB-IRM can learn (potentially imperfect) RMs from noisy traces and exploit them to train an RL agent to solve its tasks successfully. Despite the complexity of learning the RM from noisy traces, agents trained with PROB-IRM perform comparably to agents provided with handcrafted RMs.

Detecting Collective Liquidity Taking Distributions – Andrei Balcau

Tools to identify and characterise the various types of agents in financial markets are essential for both regulators and practitioners. We introduce a methodology that combines agent-based modelling and machine learning to detect collective trading behaviour. Our detection method employs observable market variables to estimate the hidden composition of market participants. More precisely, we use the paths followed by the trend and the volatility of the midprice, and the traded volumes to infer the proportions in which different types of liquidity takers are active in the market (i.e., the market composition). We focus on a market with strategic continuous liquidity provision, populated by three common types of liquidity takers: informed traders, noise traders, and trend followers. We find that the paths of the trend and the volatility carry insufficient information about market composition when employed separately as estimators. However, when these two are non-linearly combined with the volume path, the detector performance increases substantially. Our study contributes to the financial behaviour recognition literature by offering insights into which market factors best describe the collective trading behaviour of liquidity takers.