Trustful Ontology Engineering and Reasoning through Provenance

Ontologies have become fundamental AI artifacts in providing knowledge to intelligent systems. The concepts and relationships formalised in these ontologies are frequently used to semantically annotate data, helping machines to understand their meaning as humans do, and empowering them to provide better search results in products from Google (through knowledge panels), Apple and Amazon (Siri and Alexa voice-assisted answers). To create these ontologies, various (frequently collaborative) ontology engineering methodologies have been developed, and successfully applied in creating ontologies for medicine, geosciences and government data.

Cultural heritage institutions have started to use these methodologies to create their own ontologies. However, they have encountered that their datasets, which capture subtle aspects of our history and culture, are especially sensitive towards socially unsafe and untrustworthy practices: history rewriting, bigotry, and unverified political claims are all too easy to amplify if the ontologies annotating the data are inappropriately engineered. Reasoning engines (e.g. in description logics) can still derive logically valid conclusions from bogus annotations, but they lack means to assess which annotations can be trusted, and which may come from untrusted or unvalidated sources and methodologies.

Historians rely on data provenance – a record trail of the origin of a piece of data and its transformation processes – as a tool to assess reliability, quality and trustworthiness of cultural and historical datasets. Surprisingly, this provenance has little to no role in formal ontology engineering methodologies; although some mechanisms for ontology testing exist (e.g. competency questions, reasoning), a formal, machine-processable provenance trail is never used to document the engineering decisions, assess the quality of the created ontology, and weight statements in the reasoning process. How can the provenance verification mechanisms of historians be integrated in ontology engineering methodologies? How can provenance models be used in designing safe, trustful, and historically aware reasoning engines that give more importance to provenance-rich statements?

This project uses the W3C PROV provenance model to devise novel, safe and trustful ways of developing ontologies for datasets that are especially sensitive towards historical, social and cultural manipulation; and reasoning processes that give more importance to provenance-rich ontologies. Consequently, it embeds data provenance trails in ontology engineering methodologies to document design decisions and measure their consequences. More specifically, the project:

– Investigates the practices of historians and digital scholars in knowledge and ontology engineering
– Extends W3C PROV to support such practices
– Proposes novel ontology engineering and ontology testing methodologies based on these PROV extensions
– Devises new reasoning algorithms that are aware and leverage ontologies engineered this way for a trusted reasoning process
– Based on all these, develops novel metrics of trust in two use cases, on European musical cultural heritage (Polifonia) and digital newspapers

With provenance securing ontology engineering and reasoning, we will ensure that ontologies will not propagate biases and manipulative views on disputed subjects; allowing for the assessment of quality and trustworthiness of the ontology engineering methodology itself.

Hitzler, P., 2021. A review of the semantic web field. Communications of the ACM, 64(2), pp.76-83.
Groth, P., Moreau, L. PROV-Overview: An Overview of the PROV Family of Documents. W3C, 2013
Simperl, E. and Luczak-Rösch, M., 2014. Collaborative ontology engineering: a survey. Collaborative ontology engineering: a survey – CORE

Project ID



Albert Meroño Peñuela


AI Provenance, Logic