Large language models have become the main backbone of most state-of-the-art NLP systems. By pre-training on very large datasets with unsupervised objectives, these models are able to learn good representations for language composition, then transfer these over to a task requiring language understanding [1,2,3]. Recent advances in generative language modeling have shown that sufficiently large language models are able to solve even complicated tasks without additional task-specific fine-tuning, simply by processing natural language questions and instructions [4,5,6,7].
While these advancements are very promising for the field, large language models still have certain shortcomings. One such important shortcoming is lack of consistency with facts. As language models simply predict words based on their context, they can easily generate incorrect information and hallucinate novel sequences when it comes to specific facts and details. This, in turn, creates issues with the safety and reliability of these models, as they can’t be trusted in any high-stakes applications.
This is a particular area where symbolic systems excel, as they are designed to be particularly precise and constrained by facts. However, symbolic systems usually do not have enough coverage of language understanding in order to represent all the complexities and possible variations of natural language, something which large language models are able to do effortlessly. Therefore, we propose combining these two directions of work, which have been traditionally developing in isolation from each other.
This project will investigate methods for extending large language models with the abilities to query external systems as needed. During the generation process, the language model could decide that a particular fact should come from an external symbolic system, as opposed to generating it directly from the language model. It could generate the function call as part of its output, and the result could then be fed back into the language model input as additional context. For example, the language model could look up relations in a knowledge base, perform a query to an SQL database, send mathematical operations to a calculator or generate a python script to answer a question. The language model would essentially learn to act as a manager, generating the output when it knows the answer, or delegating to an appropriate symbolic system when it does not.
Some limited early research has shown promising results in this direction. Gao et al. (2022) demonstrated how chain-of-thought prompting can be extended to use programming, in addition to natural language [8]. Cheng et al., (2022) propose a setup for interacting with database tables using large language models that is able to propose queries to itself [9]. Schick et al. (2023) recently investigated the integration of language models and tools such as a calendar, a question answering system or a machine translation system, using model fine-tuning. This project will develop a more general-purpose approach that would allow language models to interact with different symbolic systems, thereby increasing their reliability and consistency with facts, and investigate more data-efficient techniques for teaching these abilities to the language models.