A framework for verifying the safety and trustworthiness of AI systems

AI systems are increasingly used to aid human-decision making. Whilst AI systems have seen successes in a variety of tasks achieving highly-accurate results, oftentimes these systems tend to lack explainability and transparent reasoning. Choosing datasets to evaluate AI systems is usually driven by the availability and size of the datasets without necessarily performing a thorough inspection of the dataset. These issues hinder the trustworthiness of AI systems and limit the adoption of these systems in the wider social context.

This PhD project will aim to define methodologies in three key areas that pave the way towards a safe and trusted AI system: bias, explainability, and safety. The project will lead to the creation of methods that check whether the dataset used for training is biased in order to avoid unfair outputs given by the AI system. It will use computational argumentation as the basis of explaining the outputs, with the potential of including feedback from humans back into the models. Various types of explanations will be considered to address the needs of different stakeholders such as researchers developing similar models as well as non-experts. The project will also lead to the creation of methods that test the vulnerability of the AI systems to small perturbations to the training dataset. Thus, the PhD project will address bias, explainability, and safety, which are key points in ensuring a safe and trusted AI.

Project ID

STAI-CDT-2021-KCL-1

Supervisor