Recent advances in deep reinforcement learning (DRL) have allowed computer programs to beat humans at complex games like Chess or Go years before the original projections. However, the SOTA in DRL misses out on some of the core cognitive capabilities we would like to extend to machines. Specifically, DRL lack of transferability. Transferability can be “transductive” or “inductive”. The former refers to the problem of deploying a (DRL) model trained in a given source domain, to a new target domain that might have different features, in order to solve the same task. The inability of generalising makes pure DRL methods inappropriate to address this problem. The inductive transfer refers instead to the situation where a DRL model trained to solve a specific task is deployed to solve a more general task (possibly within the same domain) that requires skills learned in the first task.
Recently, hierarchical RL methods have tried to tackle inductive transfer, but they still tend to relay upon a pre-engineered structured of tasks and subtasks. Augmenting DRL with logic-based learning could enable DRL to learn general properties about the domains of exploration and policies that are expressed at a higher-level of abstraction (e.g. in terms of these general properties) which can be more easily applicable to similar but different target domains. Similarly, the ability of logic-based learning to learn new knowledge by making using of not only labelled examples but also existing (partial) knowledge can provide DRL the means to learn new skills by leveraging on both the exploration-driven evidence and already learned skills.
The goal for this project is to develop a novel neural-symbolic reinforcement learning approach to tackle transductive and inductive transfer by combining RL exploration of the environment with logic-based learning of high-level policies. The project will build on recently results in the SPIKE group, led by Prof Russo, on (i) combining low-level neural options, with high-level meta-policies (i.e. policies over the options) expressed in the form of (structured) automata[1,2], exploiting recent advancements made in the field of logic-based machine learning [3,4].
The project will look in particular at the AnimalAI or Meta-World tasks.
Possible directions for improving the current SOTA include:
– Learning hierarchical meta-policies in the form of hierarchical automata to facilitate inductive transfer of RK agents.
– Learning meta-policies about abstract observations instead of ground observations, so to facilitate generalizability and scalability.
– integrating the logic-based learning of meta-policies with object-centric methods for detecting properties in the environments and pre-trained options. This will build upon the recent work presented at IJCLR2021 [5].
.