Teaching Large Language Models To Perform Complex Reasoning

Large language models have become the main backbone of most state-of-the-art NLP systems. By pre-training on very large datasets with unsupervised objectives, these models are able to learn good representations for language composition, then transfer these over to a task requiring language understanding [1,2,3]. Recent advances in generative language modeling have shown that sufficiently large language models are able to solve even complicated tasks without additional task-specific fine-tuning, simply by processing natural language questions and instructions [4,5,6,7].

While these advancements are very promising for the field, large language models still have certain shortcomings. One such important shortcoming is the inability to perform multi-step reasoning across the given inputs. While a user might assume that the model goes through all the necessary reasoning steps to answer a given question, the approach of current models is closer to guessing the correct answer based on words used in the input. This leads to incorrect answers, inability to understand when a question cannot be answered, and the models being easily confused by irrelevant additional context. These shortcomings raise issues with reliability and trustworthiness of large language models, while also leaving vulnerabilities for malicious attacks.

Logical reasoning is an area where symbolic systems excel. However, the symbolic solvers usually do not have enough coverage of language understanding in order to represent all the complexities and possible variations of natural language, something which large language models are able to do effortlessly. Therefore, we propose combining these two directions of work, which have been traditionally developing in isolation from each other.

This project will investigate methods for extending large language models with the abilities to perform complex multi-step reasoning by learning from symbolic systems. One possible route to investigate is generating examples of reasoning using symbolic systems, then training large language models on these inputs. Examples of predicate logic can be translated into natural language, resulting in step-by-step instructions for the reasoning process. In turn, large language models have shown greatly improved reasoning abilities when using chain-of-thought examples, which explicitly teach the model to go through a reasoning process before committing to an answer for a complex question [8,9,10]. By generating logic-based chain-of-thought examples from a symbolic system, we can explicitly teach the language models to go through the necessary reasoning steps and improve their abilities of multi-step reasoning.

[1] Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).

[2] Liu, Yinhan, et al. “Roberta: A robustly optimized bert pretraining approach.” arXiv preprint arXiv:1907.11692 (2019).

[3] He, Pengcheng, et al. “DeBERTa: Decoding-enhanced BERT with Disentangled Attention.” arXiv preprint arXiv:2006.03654 (2020).

[4] Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.

[5] Rae, Jack W., Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides et al. “Scaling language models: Methods, analysis & insights from training gopher.” arXiv preprint arXiv:2112.11446 (2021).

[6] Lieber, Opher, Or Sharir, Barak Lenz, and Yoav Shoham. “Jurassic-1: Technical details and evaluation.” White Paper. AI21 Labs 1 (2021).

[7] Smith, Shaden, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu et al. “Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model.” arXiv preprint arXiv:2201.11990 (2022).

[8] Wei, Jason, et al. “Chain of thought prompting elicits reasoning in large language models.” arXiv preprint arXiv:2201.11903 (2022).

[9] Gao, Luyu, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. “PAL: Program-aided Language Models.” arXiv preprint arXiv:2211.10435 (2022).

[10] Kojima, Takeshi, et al. “Large language models are zero-shot reasoners.” arXiv preprint arXiv:2205.11916 (2022).

Project ID

STAI-CDT-2023-IC-9

Supervisor

Dr Marek Reihttps://www.marekrei.com/

Category

AI Planning, Logic