Robustness of argument mining models

The standard approach for evaluating machine learning models is to use held-out data and report various performance metrics such as accuracy and F1. Whilst these metrics summarise a model’s performance, they are ultimately aggregate statistics, limiting our insights into the model’s performance in a particular situation, where the model succeeds or fails, thus how robust the model really is.

In the field of natural language processing (NLP), several frameworks have been developed to evaluate NLP robustness [1, 2, 3, 4], with a focus on well-established tasks such as sentiment analysis and natural language inference, but overlooking growing research areas such as argument mining (see [5] for our work on this).

Computational argumentation is a research area in NLP which encompasses several tasks such as argument mining (the automatic identification of natural language arguments and their relations from text) and argument quality assessment, amongst others (see [6] for a survey). Argument mining has been applied to several areas: persuasive essays, scientific articles, Wikipedia articles, news articles, online debates, product reviews, social media, legal documents, and political debates.

The project will focus on developing robust models for argument mining and explaining novel algorithms to explain the robustness of these models.


Project ID