In recent years machine learning has made great strides in achieving human or greater than human performance in a number of areas. Despite their success machine learning systems, in particular deep neural networks, lack transparent decision-making processes, which limits their trustworthiness.
This lack of trust limits the applicability of such algorithms and has led to increasing interest in a field within AI research known as explainable artificial intelligence (XAI). XAI investigates and develops methods of explainable, and crucially trustworthy AI.
This report describes the current state-of-the-art methods for generating explanations for deep neural networks. These methods are then evaluated by comparing how human-understandable (interpretable) and how reliable (faithful) the generated explanations are. It further identifies a trade-off between these two qualities, and suggests a plausible remedy is to augment current faithful methods to be more interpretable.
A line of investigation addressing this challenge is proposed, and empirical results described. Given the results of the primary steps, this evidence suggests continued development in this research direction, and provides a timeline for future tasks, challenges, and deliverables.