In Reinforcement Learning (RL) autonomous agents have typically to choose their actions in order to maximise some notion of cumulative reward . Tools and techniques for RL have been applied successfully to domain as diverse as game theory, control theory, multi-agent systems, and swarm intelligence.
However, in several cases we do not see our agents simply as reward-maximisers: while learning to perform a certain task (say, exploring an unknown space) we might want our software agents never to enter a “deadlock” state, from which they are not able to get out. Similarly, message-passing agents need always to be responsive, i.e., able to receive and send messages.
These multi-agent scenarios calls for specification languages expressive enough to capture the behaviour of learning agents, as well as model checking methods to tackle the verification of properties such as those above.
However, most of the research on formal methods for strategic reasoning has focused on hard-coded (non-learning) agents. But for the development of safe and trusted AI systems it is key that agents are able to learn in a safe and secure manner.
Your contribution: you will first extend Markov-decision processes, the formalisms of choice for RL, to multi-agent systems by considering the simultaneous execution of several learning agents.
Then, you will develop an interpretation of Alternating-time Temporal Logic , one of the most popular logic-based languages to specify the strategic behaviour of agents, on these multi-agent Markov decision processes. In particular, the satisfaction of the agents goals will be evaluated only w.r.t. executions that maximise the agents’ reward.
In this context you will analyse the complexity of the corresponding model checking problem. We anticipate that in selected cases the verification task will be decidable. For such cases an implementation might be envisaged as part of the project.
 Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press. ISBN 0-262-19398-1.
 Alur, R.; Henzinger, T.; and Kupferman, O. 2002. Alternating-time temporal logic. Journal of the ACM 49(5):672713.