Ensuring that reinforcement learning emits policies that are safe and verifiable is crucial, and my project aims to do so through integrating Formal Methods, such as using temporal logic to shield the policy from performing unsafe action or synthetising reward monitors for precise reward specifications with safety components. Largely, emphasis and research has been directed towards single-agent systems, as opposed to multi-agent systems which have different properties of interest (e.g. strategic and epistemic), as well as requiring further research in developing pragmatic methods which scale with the multi-agent setting.