Improving Robustness of Pre-Trained Language Models

Recent efforts to Natural Language Understanding (NLU) have been largely exemplified in tasks such as natural language inference, reading comprehension and question answering. We have witnessed the shift of paradigms in NLP from fine-tuning a large-scale pre-trained language model (PLM) on task-specific data to prompt-based learning where a description of a task is embedded into input to PLM so that the same language model can be used for multiple tasks. Although prompt-based learning approaches built on PLMs have achieved state-of-the-art performance in many NLP tasks, they are susceptible to adversarial attack that even a small perturbation of an input (e.g., paraphrase questions and/or answers in QA tasks) would result in dramatic decrease in models’ performance, showing that such models largely rely on shallow cues.

This project will focus on addressing the robustness issue of PLMs. It has been extensively studied that machine learning models without consideration of causality may suffer from modelling spurious correlation and failing to deal with Out-Of-Distribution (OOD) data. To address the problem of spurious correlation, we can incorporate causal knowledge between observations and output labels into model learning. The causal knowledge can be obtained from external knowledge sources such as ATOMIC. It can also be pre-defined by users in symbolic forms such as logic rules. In addition, the causal knowledge can be inferred from counterfactual examples generated by automatic text rewriting. Training on counterfactuals can improve OOD generalisation and reduce noise sensitivity. Counterfactual data augmentation can also be used to reduce bias in PLMs.

Furthermore, we will explore approaches for disentangled learning of domain-invariant and domain-dependent features. For example, training data can come from a finite set of domains, in which each domain has a unique distribution over input, but the causal relationship between input and output is invariant across domains. Training NLP models on domain-invariant features could potentially improve the generalisation to out-of-domain data.

Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H. and Neubig, G., 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586.

Zhang, W.E., Sheng, Q.Z., Alhazmi, A. and Li, C., 2020. Adversarial attacks on deep-learning models in natural language processing: A survey. ACM Transactions on Intelligent Systems and Technology (TIST), 11(3), pp.1-41.

Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

Project ID

STAI-CDT-2023-KCL-25

Supervisor

Yulan Hehttps://www.kcl.ac.uk/people/yulan-he

Category

Logic, Norms, Reasoning