Generating faithful and interpretable explanations for the classifications of a deep neural network

Sophia Kalanovska, Year 2 STAI CDT Student

5 March 2021

2:00 pm - 3:00 pm

In recent years machine learning has made great strides in achieving human or greater than human performance in a number of areas. Despite their success machine learning systems, in particular deep neural networks, lack transparent decision-making processes, which limits their trustworthiness. 

This lack of trust limits the applicability of such algorithms and has led to increasing interest in a field within AI research known as explainable artificial intelligence (XAI). XAI investigates and develops methods of explainable, and crucially trustworthy AI. 

This report describes the current state-of-the-art methods for generating explanations for deep neural networks. These methods are then evaluated by comparing how human-understandable (interpretable) and how reliable (faithful) the generated explanations are. It further identifies a trade-off between these two qualities, and suggests a plausible remedy is to augment current faithful methods to be more interpretable. 

A line of investigation addressing this challenge is proposed, and empirical results described. Given the results of the primary steps, this evidence suggests continued development in this research direction, and provides a timeline for future tasks, challenges, and deliverables.