Reward Synthesis from Logical Specifications

The growing societal impact of AI-based systems has brought with it a set of risks and concerns [1, 2].
Indeed, unintended and harmful behaviours may emerge from the application of machine learning (ML) algorithms, including negative side effects, reward hacking, unsafe exploration [3].
Hence, the safe and trusted deployment of AI systems calls for their verification and certification. In particular, verifiable artificial intelligence (VAI) has been described as “the goal of designing AI-based systems that have strong, ideally provable, assurances of correctness with respect to mathematically-specified requirements.” [4]
Verifiable AI can be achieved by applying formal methods, which have already proved successful in the specification, design, and verification of reactive, distributed, and multi-agent systems.
However, the application of formal methods in AI raises a number of challenges, from modelling continuous behaviours and formalising quantitative requirements, to high-dimensional input and parameters spaces, and online adaptation and evolution.

This project contributes to the goal of verifiable AI through the development of techniques to synthesize reward functions in reinforcement learning (RL) directly from logical specifications.  
Reinforcement learning is one of the most successful and widely-used techniques to learn control policies for
a variety of tasks [5]. However, one of the challenges to the concrete application of RL is the design of reward functions that are guaranteed to induce the correct behaviour on agents, in order to achieve the intended goals. The engineering-designer often needs to manually shape the reward function to ensure convergence of the learning algorithm, and possibly avoiding reward hacking.
On the other hand, temporal logics are a rich and compact formalism to express complex tasks, including multiple safety and reachability constraints [6]. But it is normally difficult to translated temporal formulas directly into reward functions, even though some contributions on this subject are appearing [7].

Your contribution: you will develop a logic-based language for specifying complex, temporally-extended tasks in reinforcement learning, along with algorithms to compile specifications in the temporal language into a reward function, so as to perform reward shaping automatically. You will implement this translation in a tool and compare it with the state of the art, including the SPECTRL tool from [7]. You will also consider multi-agent settings, where several agents might behave cooperatively or competitively to satisfy the temporal goal, as well as the case of partial observability, where agents have only imperfect knowledge of the environment they are acting in.
The results achieved in this project will contribute towards the goal of safe and verifiable AI.

[1] Russell, S.; Dietterich, T.; Horvitz, E.; Selman, B.; Rossi, F.; Hassabis, D.; Legg, S.; Suleyman, M.; George, D. & Phoenix, S. Letter to the Editor: Research Priorities for Robust and Beneficial Artificial Intelligence: An Open Letter AI Magazine, 2015, 36, 3-4

[2] Dietterich, T. G. & Horvitz, E. J. Rise of Concerns about AI: Reflections and Directions Commun. ACM, Association for Computing Machinery, 2015, 58, 38–40

[3] Dario Amodei, Chris Olah, Jacob Steinhardt, Paul F. Christiano, John Schulman, and Dan Mané.
Concrete problems in AI safety. CoRR, abs/1606.06565, 2016.

[4] Sanjit A. Seshia and Dorsa Sadigh. Towards verified artificial intelligence. CoRR, abs/1606.08514, 2016.

[5] Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning: An Introduction. MIT Press. ISBN 0-262-19398-1.

[6] M. Huth, M. Ryan; Logic in Computer Science, Cambridge University Press, 2012.

[7] K. Jothimurugan, R. Alur, O. Bastani; A Composable Specification Language for Reinforcement Learning Tasks, in NeurIPS 2019.

Project ID



Francesco Belardinelli


AI Planning, Logic, Verification