Detecting Deception and Manipulation in Planning and Explanation Systems

Planning algorithms are used in a variety of contexts, from navigation apps to recommendation algorithms, robot vacuums, autonomous vehicles, etc.
Companies using such algorithms have financial incentives to manipulate (or nudge) user behaviour so as to obtain valuable data or generate profitable actions.
For example, a navigation app could suggest a path to a destination that is not very fast but uses roads on which there is few traffic data (in order to improve or sell that data). Similarly, a conversation agent could try to manipulate conversation such as to arrive at the discussion of products that 3rd parties would like to sell.
Furthermore, when such algorithms provide explanations for decisions, there are incentives to lie about the real reasons — for example when these could upset individual users or reveal controversial practices to the public.

The objective of this PhD project is to develop algorithms that are able to; 1) detect hidden goals in plans, and 2) detect the absence or incorrectness of information provided by plan-explanation algorithms.
The project will explore the potential of tools such as preference estimation, inverse optimisation, inverse reinforcement learning or goal recognition to detect hidden goals. As well as tools from experiment optimisation and software testing to speed up the process of finding conflicts between explanations that show incorrectness.

The project will focus on a small set of use cases (depending on student interests and simulation difficulty) such as navigation apps, home robots, recommendation algorithms or conversation agents.

– M. Chromik, M. Eiband, S. Völkel, D. Buschek, “Dark Patterns of Explainability, Transparency, and User Control for Intelligent Systems,” IUI Workshops, 2019.
– S. Sohrabi, A. Riabov, O. Udrea, “Plan Recognition as Planning Revisited,” in IJCAI, 2016.
– S. Arora, P. Doshi, “A survey of inverse reinforcement learning: Challenges, methods and progress,” in Artificial Intelligence, 2021.

Project ID

STAI-CDT-2023-KCL-2

Supervisor

Martim Brandaohttps://www.martimbrandao.com/

Category

AI Planning