Trusted Collective Intelligence through Norms, Ontologies and Provenance

Collective intelligence (CI) communities are among the greatest examples of collaboration, capability, and creativity of the digital age. CI communities allow large groups of individuals to work together towards a shared goal. CI platforms such as Wikipedia, Zooniverse, and Kaggle gather contributions from millions worldwide, with a substantial impact on science, education, the economy, and the environment. Many of them have proven effective at creating and curating labelled datasets for machine learning at scale and with speed [2, 4].

As these systems continue to grow, so does the challenge of delivering and maintaining them, with many of them turning to AI to manage content and support contributors. They use data-driven AI tools to filter the labelling tasks that the community get to see, infer correct labels, and understand participation profiles.

Such automated decisions introduce trust concerns [3]: for the data that they produce; for the downstream applications using the data; but also for the sociotechnical ecosystem in which the data is produced and the likelihood to tackle adversarial attacks, groupthink, or polarisation. Most prior research has looked at the question of trust from an individual perspective: how and when does someone who interacts with an AI capability trust that AI? However, state-of-the-art models for automated image annotation and text generation have been found to be biased, unsafe and untrustworthy when deployed in real-world scenarios (e.g. medical contexts [1]) their deployment in collective intelligence platforms can potentially magnify these issues. Collective intelligence scenarios, where teams, groups, communities engage with AI have distinct challenges; for instance, any approach to explain what the AI does has to consider the diversity of backgrounds, opinions, beliefs, and motivations of multiple participants.

This PhD project will aim to understand, making contributions to model-driven AI, how we could design better AI-driven processes that supported trusted interactions and collective decision making. To do so, the project will:
– First use various techniques to extract, from primarily text-based CI community governance documents, a set of formal norms, ontological axioms, and provenance traces setting the limitations and constraints of bespoke automatically generated texts and labels.
– Then use such extracted norms, ontological axioms and provenance traces to build a large, structured knowledge graph of trusted interactions and decision making within collective intelligence communities. This knowledge graph is then injected into downstream AI tasks (e.g. image captioning, language models) to constrain their behaviour according to the community’s preferences and governance
– Finally, building upon a literature review of the main uses of AI processes in CI systems, engage with a representative range of volunteer communities to understand how they engage with automated capabilities. This will be done through workshops, in which an understanding of current practices and challenges faced by the community will be established – specifically around interacting with AI in the collective.

The results of this user-centred design will feed into novel interventions to support CI communities which use AI. A range of prototypes will be built in increasing fidelity to explore key AI interventions directly with the CI communities. The approaches and techniques for trustworthy human-AI teams will be evaluated with end-user communities such as Wikipedia. The result of the project will be a framework of trusted AI within collectives, in order to provide a basis from which collectives might adopt and built trustworthy AI processes into their work.

[1] Challen, R., Denny, J., Pitt, M., Gompels, L., Edwards, T. and Tsaneva-Atanasova, K., 2019. Artificial intelligence, bias and clinical safety. BMJ Quality & Safety, 28(3), (pp.231-237).

[2] Kaffee, L.A., ElSahar, H., Vougiouklis, P., Gravier, C., Laforest, F., Hare, J. and Simperl, E., 2018, June. Mind the (language) gap: Generation of multilingual wikipedia summaries from wikidata for articleplaceholders. In European Semantic Web Conference (pp. 319-334).

[3] Smith, C., E., Yu, B., Srivastava, A., Halfaker, A., Terveen, L., and Zhu, H.,. 2020. Keeping Community in the Loop: Understanding Wikipedia Stakeholder Values for Machine Learning-Based Systems. Proceedings of ACM CHI 2020.

[4] Willi, M., Pitman, R.T., Cardoso, A.W., Locke, C., Swanson, A., Boyer, A., Veldthuis, M. and Fortson, L., 2019. Identifying animal species in camera trap images using deep learning and citizen science. Methods in Ecology and Evolution, 10(1), pp.80-91.

Project ID



Prof Elena Simperl

Dr Timothy Neate


AI Provenance, Norms