Neural-symbolic Reinforcement Learning.

Recent advances in deep reinforcement learning (DRL) have allowed computer programs to beat humans at complex games like Chess or Go years before the original projections. However, the SOTA in DRL misses out on some of the core cognitive capabilities we would like to extend to machines. Specifically, DRL lack of transferability. Transferability can be “transductive” or “inductive”. The former refers to the problem of deploying a (DRL) model trained in a given source domain, to a new target domain that might have different features, in order to solve the same task. The inability of generalising makes pure DRL methods inappropriate to address this problem. The inductive transfer refers instead to the situation where a DRL model trained to solve a specific task is deployed to solve a more general task (possibly within the same domain) that requires skills learned in the first task.

Recently, hierarchical RL methods have tried to tackle inductive transfer, but they still tend to relay upon a pre-engineered structured of tasks and subtasks. Augmenting DRL with logic-based learning could enable DRL to learn general properties about the domains of exploration and policies that are expressed at a higher-level of abstraction (e.g. in terms of these general properties) which can be more easily applicable to similar but different target domains. Similarly, the ability of logic-based learning to learn new knowledge by making using of not only labelled examples but also existing (partial) knowledge can provide DRL the means to learn new skills by leveraging on both the exploration-driven evidence and already learned skills.

The goal for this project is to develop a novel neural-symbolic reinforcement learning approach to tackle transductive and inductive transfer by combining RL exploration of the environment with logic-based learning of high-level policies. The project will build on recently results in the SPIKE group, led by Prof Russo, on (i) combining low-level neural options, with high-level meta-policies (i.e. policies over the options) expressed in the form of (structured) automata[1,2], exploiting recent advancements made in the field of logic-based machine learning [3,4].

The project will look in particular at the AnimalAI or Meta-World tasks.

Possible directions for improving the current SOTA include:

– Learning hierarchical meta-policies in the form of hierarchical automata to facilitate inductive transfer of RK agents.

– Learning meta-policies about abstract observations instead of ground observations, so to facilitate generalizability and scalability.

– integrating the logic-based learning of meta-policies with object-centric methods for detecting properties in the environments and pre-trained options. This will build upon the recent work presented at IJCLR2021 [5].

.

[1] Daniel Furelos-Blanco, Mark Law, Anders Jonsson, Krysia Broda, Alessandra Russo:
Induction and Exploitation of Subgoal Automata for Reinforcement Learning. J. Artif. Intell. Res. 70: 1031-1116, 2021.

[2] Daniel Furelos-Blanco, Mark Law, Alessandra Russo, Krysia Broda, Anders Jonsson:
Induction of Subgoal Automata for Reinforcement Learning. AAAI 2020: 3890-3897.

[3] Mark Law, Alessandra Russo, Krysia Broda, Elisa Bertino: Scalable Non-observational Predicate Learning in ASP. IJCAI 2021: 1936-1943.

[4] Mark Law, Alessandra Russo, Elisa Bertino, Krysia Broda, Jorge Lobo: FastLAS: Scalable Inductive Logic Programming Incorporating Domain-Specific Optimisation Criteria. AAAI 2020: 2877-2885.

[5] Ludovico Mitchener, David Tuckey, Matthew Crosby, Alessandra Russo. Detect, Understand, Act: A Neuro-Symbolic Hierarchical Reinforcement Learning Framework. NeSy2021.

Note:
The current 1st year STAI PhD student allocated to this project will explore the learning of stochastic automata in combination with RL agents, building on probabilistic circuits techniques instead of logic-based learning. Probabilistic circuits are scalable stochastic method for learning and reasoning over symbolic structures. any of the three highlighted directions of research will different from the PhD topic undertaken by the current STAI PhD student.

Project ID

STAI-CDT-2021-IC-2

Supervisor

Alessandra Russohttp://wp.doc.ic.ac.uk/arusso/

Category

AI Planning, Logic