Recent advances in deep reinforcement learning (DRL) have allowed computer programs to beat humans at complex games like Chess or Go years before the original projections. However, the SOTA in DRL misses out on some of the core cognitive capabilities we would like to extend to machines. Specifically, DRL lack of transferability. Transferability can be transductive or inductive. The former refers to the problem of deploying a (DRL) model trained in a given source domain, to a new target domain that might have different features, in order to solve the same task. The inability of generalising makes pure DRL methods inappropriate to address this problem. The inductive transfer refers instead to the situation where a DRL model trained to solve a specific task is deployed to solve a more general task (possibly within the same domain) that requires skills learned in the first task. Recently, hierarchical RL methods have tried to tackle inductive transfer, but they still tend to relay upon a pre-engineered structured of tasks and subtasks. Augmenting DRL with symbolic learning could enable DRL to learn general properties about the domains of exploration and policies that are expressed at a higher-level of abstraction (e.g. in terms of these general properties) which can be more easily applicable to similar but different target domains. Similarly, the ability of symbolic learning to learn new knowledge by making using of not only labelled examples but also existing (partial) knowledge can provide DRL the means to learn new skills by leveraging on both the exploration-driven evidence and already learned skills.
The goal for this project is to develop a novel neural-symbolic reinforcement learning approach to tackle transductive and inductive transfer by combining RL exploration of the environment with symbolic learning of high-level policies. The project will build on recently results in the SPIKE group, led by Prof Russo, on (i) combining low-level neural policies with high-level symbolic policies, and (ii) learning compositionally of high-level symbolic policies in the form of (structured) automata. The project will look in particular at the AnimalAI or Meta-World tasks.
Opportunities for improvements on the current SOTA include:
– More robust RL policy learning using inductive logic programmingin particular ILASP.
– Efficient grounding of environments using deep methods with a particular interest in interpretable generative models.
– Leveraging the overlap between multiple DRL sub-policies and training robust DRL policies for various skills in an efficient manner. This will lead to a novel solution for meta-DRL.