Knowledge graphs and knowledge bases are forms of symbolic knowledge representations used across AI applications. Both refer to a set of technologies that organise data for easier access, capture information about people, places, events, and other entities of interest, and forge connections between them. As AI (re-)conquered the world, symbolic knowledge representations became ubiquitous, and are now extensively used in everything from search engines and chatbots to product recommenders and autonomous systems, especially in the context of neuro-symbolic approaches.
Knowledge engineering is the field that encompasses technical and social aspects related to building knowledge-based AI systems. In its most recent manifestation, it involves complex, human-machine workflows including knowledge acqusition from experts, crowdsourced entity typing and reconciliation, argumentation and discussion support, information extraction algorithms across different data modalities, and database lifting. The result is used in AI systems in multiple ways – from adding context when data is scarce to generating explainations.
The context of this project is Wikidata, the largest knowledge engineering project to date, containing information about billions of things in the real world in a machine-understandable way. Wikidata is created by a community of several tens of thousands of active editors and more than 300 AI bots. Given its scale and importance, Wikidata offers a unique opportunity to improve the quality of symbolic knowledge we use in AI systems – this includes both empirical research to understand how the knowledge was created and evolved (using a combination of machine learning, argumentation methods, and provenance analytics) and novel methods and human-AI workflows for quality assurance and repair. The project would:
– first understand how instance and ontological data has evolved over the past nine years, in terms of quality. This includes, among others, the use of networks of small vocabularies expressed in the ShEx (Shape Expressions language) as an alternative to more organic forms of ontology building;
– develop methods (using information extraction from different data modalities) to generate new ShEx knowledge automatically or evolve existing ones;
– use computational argumentation methods to reveal potential conflicts and quality issues;
– test how the new approaches improve on the explainability of downstream AI applications.