Language models learned from data have become prevalent in AI systems, but they are sensitive to the identification of undesired behaviour posing risks to society, like offensive language. The task of automatic detection of offensive language has attracted significant attention in Natural Language Processing (NLP) due to its high social impact. Policy makers and online platforms can leverage computational methods of offensive language detection to oppose online abuse at scale. State-of-the-art methods for automatic offensive language detection, typically relying on ensembles of transformer-based language models such as BERT, are trained on large-scale annotated datasets.
Detecting offensive language is aggravated by the fact that the meaning of words changes over time, and conventional, neutral language can evolve into offensive language at short time scales, following rapid changes in social dynamics or political events. The word karen, from a neutrally connotated name of person, for example, acquired an offensive meaning in 2020, turning into a “pejorative term for a white woman perceived as entitled or demanding beyond the scope of what is normal”. Adapting to the way meaning of language changes is a key characteristic of intelligent behaviour. Current AI systems developed to process language computationally are not yet equipped to react to such changes: the artificial neural networks they are built on do not capture the full semantic range of words, which only becomes available if we access additional knowledge (e.g. author, genre, origin, register) that is typically contained in external, symbolic, and linguistic world knowledge bases.
This project aims to develop new computational methods for offensive language detection that combine distributional information from large textual datasets with symbolic knowledge representations to develop time-sensitive methods for offensive language detection. Specifically, this project will develop representations of word meaning from textual data and external knowledge bases containing relevant linguistic and world knowledge, such as lexicons, thesauri, semantic networks, knowledge graphs (e.g. Wikidata), and ontologies, embedding this knowledge into distributional word vectors derived from time-sensitive text data (diachronic corpora) and exploring various approaches for combining these representations. To achieve these goals, we envisage the following specific tasks:
- The project starts with a classic knowledge engineering task, in which we will use description logics to formalise the dynamic semantics of offensive language as an ontology that encodes the social and cultural phenomena that turn neutral words into offensive words. To do so, we will use existing examples and well understood use-cases, using a partially inductive approach.
- Using this ontology, we will annotate existing datasets and use reasoning to derive non-trivial inferences regarding concepts motivating the formation of offensive language. With such derivations, we will project both asserted and inferred knowledge as knowledge graph embeddings in geometric space.
- We will combine the embeddings derived from the reasoning process with the embeddings trained on new textual datasets. We will evaluate the approach with various techniques, including verbalisation/lexicalisation, joint specialization methods, post-processing retrofitting models, post-specialization approaches, as well as ways to inject external knowledge into pre-trained representations such as ELMO and BERT.