Reinforcement learning resembles human learning with intelligence accumulated through experiment. To attain expert human-level performance on tasks such as Atari video games or chess, deep RL systems have required many orders of magnitude more training data than human experts themselves, which appears much too slow to offer a plausible model for human learning and do not leverage symbolic abstractions.
Two of the primary sources of sample inefficiency are (a) incremental parameter adjustment, by the commonly used gradient descent; and (b) weak inductive bias, about the pattern of the task. In general, these two mean that it is unclear the extent to which deep RL can generalise to more abstract tasks, and therefore we remain in the unknown of which parts of the network learn long-term, stable, re-usable, domain-independent knowledge such as common sense concepts and reasoning in symbolic form. This is shared with many long-term, underachieved goals of AI: simultaneously to solving basic prediction tasks such as text classification, image captioning and speech recognition (through e.g. attention), having a long-term, symbolic-based neural memory that can be effectively read and written to as an additional component of AI architectures. Although various such approaches have been proposed in recent years (e.g. Neural RAM [1]), none of them have been fully implemented in derivable architectures nor adapted for specific use-cases such as common-sense reasoning and learning via RL. In addition to their implications within the AI field, both of these AI techniques bear suggestive links with psychology and neuroscience. A well-understood symbolic common sense knowledge memory component for RL could establish a solid ground for safer and more trustworthy AI.
This project aims to confront the two factors and enable deep RL to proceed in a much more sample-efficient manner. To realise this, we consider several directions below;
1- Neural networks with a memory capacity provide a promising approach to meta-learning in deep networks that enable fast learning. We consider leveraging episodic RL framework and store individual experiences in memory (such as working memory graph[3]). When a familiar state is encountered, an agent can retrieve the set of trajectories that have followed each candidate action in that state.
2- Use-case in RL and common-sense reasoning. As a starting experiment, we will use existing symbolic representations of common-sense knowledge, such as the Common Sense Knowledge Graph (CSKG [2]), as a starting symbolic representation of neural memory. In particular, the experiment will compare (a) the efficiency of learning and (b) the performance of resulting models, in a RL task in which CSKG embeddings replace neural architecture layers under the hypothesis these encode example-induced common sense knowledge. The assumption will be that if higher efficiency and similar performances are found, the memory is fulfilling its goal of providing common-sense knowledge that is general enough as to not need to learn it again in training.
Fast Reinforcement Learning using Memory-Augmented Neural Networks
[1] A. Graves, G. Wayne, I. Danihelka, Neural turing machines, arXiv preprint arXiv:1410.5401.
[2] Ilievski, F., Szekely, P., & Zhang, B. (2021). Cskg: The commonsense knowledge graph. In The Semantic Web: 18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings 18 (pp. 680-696). Springer International Publishing.
[3] Loynd, Ricky, et al. “Working memory graphs.” International conference on machine learning. PMLR, 2020.
Project ID
STAI-CDT-2023-KCL-27
Supervisor
Yali Duyali.du@kcl.ac.ukhttps://yalidu.github.io/
Albert Meroño Peñuelaalbert.merono@kcl.ac.ukhttps://www.albertmeronyo.org