Deep Reinforcement Learning (DRL) has proved to be a powerful technique that allows autonomous agents to learn optimal behaviours (aka policies) in unknown and complex environments through models of rewards and penalizations .
By extending DRL with formal specification expressed in Temporal Logic (TL), researchers have developed algorithms able learn multiple tasks at the same time .
The earlier success of model-free RL , i.e., a kind of RL algorithms  that are just “reactive” to their environment without trying to fully understand it neither planning ahead, has motivated that the great majority of the works on Safe-RL and Temporal Logic focused on model-free RL.
On the contrary, model-based RL agents work by trying to understand their surroundings and create their own model of the world around them. Then, these agents use this model to “imagine” what will happen in the future by taking the different actions available and choosing the policy with the maximum expected reward. Earlier works with model-based RL fall behind their model-free counterparts. However there are new promising contributions such as MuZero , an algorithm able to play at super-human performance in Atari, Go, Chess and Shogi; or PlaNet and Dreamer , which achieved impressive results in modeling and locomotion tasks in visual-input environments while requiring significantly less training data than model-free agents, have sparkle new interest on this subject.
Your contribution: The goal of this project is to design a new method that mixes Model-based RL with Temporal-Logic-based specifications in multi-task, safety-aware scenarios. You will first explore the literature of RL with TL and Model-based RL to become familiar with the topics. Then you will implement an algorithm mixing these components to be applied in a multi-task scenario similar to the one in .