Extending Large Language Models Through Querying Symbolic Systems

Large language models have become the main backbone of most state-of-the-art NLP systems. By pre-training on very large datasets with unsupervised objectives, these models are able to learn good representations for language composition, then transfer these over to a task requiring language understanding [1,2,3]. Recent advances in generative language modeling have shown that sufficiently large language models are able to solve even complicated tasks without additional task-specific fine-tuning, simply by processing natural language questions and instructions [4,5,6,7].

While these advancements are very promising for the field, large language models still have certain shortcomings. One such important shortcoming is lack of consistency with facts. As language models simply predict words based on their context, they can easily generate incorrect information and hallucinate novel sequences when it comes to specific facts and details. This, in turn, creates issues with the safety and reliability of these models, as they can’t be trusted in any high-stakes applications.

This is a particular area where symbolic systems excel, as they are designed to be particularly precise and constrained by facts. However, symbolic systems usually do not have enough coverage of language understanding in order to represent all the complexities and possible variations of natural language, something which large language models are able to do effortlessly. Therefore, we propose combining these two directions of work, which have been traditionally developing in isolation from each other.

This project will investigate methods for extending large language models with the abilities to query external systems as needed. During the generation process, the language model could decide that a particular fact should come from an external symbolic system, as opposed to generating it directly from the language model. It could generate the function call as part of its output, and the result could then be fed back into the language model input as additional context. For example, the language model could look up relations in a knowledge base, perform a query to an SQL database, send mathematical operations to a calculator or generate a python script to answer a question. The language model would essentially learn to act as a manager, generating the output when it knows the answer, or delegating to an appropriate symbolic system when it does not.

Some limited early research has shown promising results in this direction. Gao et al. (2022) demonstrated how chain-of-thought prompting can be extended to use programming, in addition to natural language [8]. Cheng et al., (2022) propose a setup for interacting with database tables using large language models that is able to propose queries to itself [9]. Schick et al. (2023) recently investigated the integration of language models and tools such as a calendar, a question answering system or a machine translation system, using model fine-tuning. This project will develop a more general-purpose approach that would allow language models to interact with different symbolic systems, thereby increasing their reliability and consistency with facts, and investigate more data-efficient techniques for teaching these abilities to the language models.

[1] Devlin, Jacob, et al. “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805 (2018).

[2] Liu, Yinhan, et al. “Roberta: A robustly optimized bert pretraining approach.” arXiv preprint arXiv:1907.11692 (2019).

[3] He, Pengcheng, et al. “DeBERTa: Decoding-enhanced BERT with Disentangled Attention.” arXiv preprint arXiv:2006.03654 (2020).

[4] Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. “Language models are few-shot learners.” Advances in neural information processing systems 33 (2020): 1877-1901.

[5] Rae, Jack W., Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides et al. “Scaling language models: Methods, analysis & insights from training gopher.” arXiv preprint arXiv:2112.11446 (2021).

[6] Lieber, Opher, Or Sharir, Barak Lenz, and Yoav Shoham. “Jurassic-1: Technical details and evaluation.” White Paper. AI21 Labs 1 (2021).

[7] Smith, Shaden, Mostofa Patwary, Brandon Norick, Patrick LeGresley, Samyam Rajbhandari, Jared Casper, Zhun Liu et al. “Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model.” arXiv preprint arXiv:2201.11990 (2022).

[8] Gao, Luyu, Aman Madaan, Shuyan Zhou, Uri Alon, Pengfei Liu, Yiming Yang, Jamie Callan, and Graham Neubig. “PAL: Program-aided Language Models.” arXiv preprint arXiv:2211.10435 (2022).

[9] Cheng, Zhoujun, Tianbao Xie, Peng Shi, Chengzu Li, Rahul Nadkarni, Yushi Hu, Caiming Xiong et al. “Binding language models in symbolic languages.” arXiv preprint arXiv:2210.02875 (2022).

[10] Schick, Timo, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. “Toolformer: Language models can teach themselves to use tools.” arXiv preprint arXiv:2302.04761 (2023).

Project ID

STAI-CDT-2023-IC-8

Supervisor

Dr Marek Reihttps://www.marekrei.com/

Category

AI Planning