Reinforcement Learning, and its extension Deep Reinforcement Learning (DRL), are Machine Learning (ML) techniques that allow autonomous agents to learn optimal behaviours (called policies) in unknown and complex environments through models of rewards and penalizations.
DRL algorithms have shown great potential in high-dimensional domains such as Atari games [1], which has attracted a high interest in the ML community. However, in real-world scenarios these algorithms will need to tackle a wide range of specifications, frequently more than one at the same time, and some will regard critical safety constrains that the learning agent will be asked to assure. Researchers are working in this direction [2], however work needs to be done regarding settings where multiple learning agents are working concurrently.
The aim of this project is to develop Safe through Abstraction (multi-agent) Reinforcement learning (StAR), a methodology to formally guarantee the safe behaviour of agents acting in an unknown environment through the satisfaction of safety constraints by the policy learnt using an RL algorithm, both at training and test time.
Building upon recent advances in this area, also by the applicant [3], we plan to combine reinforcement learning and formal methods to ensure the satisfaction of constraints expressed in (probabilistic) temporal logic (PTL) [4]. More specifically, the main intended contributions of the project can be summarized as follows:
– To introduce the notion of abstract Markov decision process (AMDP) and present a procedure to generate AMDPSs automatically. This step is meant to reduce the complexity of the verification task.
_ To develop techniques for strategy synthesis to produce policies at the abstract level that satisfy the safety constraints expressed in PTL.
_ To construct a shield, which is supposed to restrict the agents’ behaviour at training time, according to the safe abstract policies synthetized in the previous step, thus guaranteeing the satisfaction of the relevant constraints.
– To implement and evaluate the proposed approach in different scenarios involving multiple agents. Benchmarks will be identified during the project, with autonomous vehicles and robot navigation being among relevant candidates. As a proof of concept, the approach will be first evaluated on the guarded-flag collection domain of [2,3].
The methodology outlined above is meant to provide formal guarantees that the behaviour of the agents always meet the safety requirements by design. We plan to evaluate empirically the proposed approach on relevant benchmarks (including autonomous vehicles and robot navigation) and show its effectiveness, while degrading minimally the agent’s earning process. Furthermore, the StAR framework is to be extended to environments with multiple agents, which will require solving specific problems in a multi-agent setting. In particular, multi-agent systems are prone to the curse of dimensionality. It is therefore crucial to build abstractions small enough to allow tractability, while preserving the satisfaction of all relevant constraints.
[1]https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf
[2] G. R. Mason et al. Assured reinforcement learning with formally verified abstract policies. InICAART2017. York, 2017.
[3] P. El Mqirmi, F. Belardinelli, and B. G. León. An abstraction-based method to check multi-agent deep reinforcement-learning behaviors. In AAMAS 2021, page 474â–482, IFAAMAS.
[4] H. Hansson and B. Jonsson. A logic for reasoning about time and reliability. Formal aspects of computing, 6(5):512–535, 1994.