Integrating Sub-symbolic and Symbolic Reasoning for Value Alignment

An important long-term concern regarding the ethical impact of AI is the so called ‘value alignment problem’; that is, how to ensure that the decisions of autonomous AIs are aligned with human values. Addressing this problem, as well as the broader challenge of developing Artificial General Intelligence, will require integration of sub-symbolic and symbolic reasoning, so that agents can both implement higher level cognitive functions and communicate with humans. In particular, higher level cognitive functions such as transfer learning and long-term planning, are needed to facilitate moral decision making, and communicative interactions with humans are required by state-of-the-art solutions to value alignment [1,2,3].

The proposed project will build on a conceptual framework for integrating sub-symbolic and symbolic reasoning [4]. The essential idea is to formalise mappings between sub-symbolic ‘raw’ data and symbolic abstraction models of this raw data that usefully represent objects in the environment, their properties and interactions. Intuitively, a mapping between two layers verifies the utility of the higher layer’s abstraction, if there is an isomorphic relationship between transformations in the lower and higher layers, and so, by transitivity, between the lowest level raw data and the highest level abstract symbolic representation.

The proposed research will leverage insights and mathematical (in particular Bayesian) techniques developed within state of the art “predictive coding” models of human cognition [5]. In these models, the problem of mapping correlations between representational layers successively abstracting from lower layers, is addressed by higher layer models of the environment generating predictions as to what would be expected as input from lower layers. Errors in the prediction then serve to refine or revise the higher layer models

[1] S. Modgil. Many Kinds of Minds are Better than One: Value Alignment Through Dialogue . In: Workshop on Argumentation and Philosophy (co-located with COMMA’18).

[2] S, Modgil. Dialogical Scaffolding for Human and Artificial Agent Reasoning . In: 5th International Workshop on Artificial Intelligence and Cognition (AIC 2017), 73-86, 2017.

[3] D. Hadfield-Menell, A.Dragan, P. Abbeel, S. Russell. Cooperative inverse reinforcement learning. In: NIPS’16: Proceedings of the 30th International Conference on Neural Information Processing Systems, 3916–3924, 2016.

[4] J. Pober, M. Luck, O. Rodrigues. From Subsymbolic to Symbolic: A Blueprint for Investigation. In: NESY 2022: 16th International Workshop on Neural-Symbolic Learning and Reasoning, 88-93, 2022

[5] Andy Clark. Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford University Press, 2015

Project ID



Sanjay Modgil

Odinaldo Rodrigues