Trusted test suites for safe agent-based simulations

Agent-based models (ABMs) are an AI technique to help improve our understanding of complex real-world interactions and their ”emergent behaviour”. ABMs are used to develop and test theories or to explore how interventions might change behaviour. For example, we are working on a model of staff and patient interaction in emergency medicine exploring how interventions affect efficiency and safety. With the Francis Crick Institute, we study how cells coordinate and manage the growth of blood vessels.

To create trust in the results of ABM simulations, assurances are needed about their correctness. This requires the application of systematic software engineering techniques to ABMs, just like any other software. For example, automated test suites that can be run after every change have become state-of-the-art throughout software engineering. They provide a strong safeguard against regressions: the accidental introduction of a problem in one part of a complex piece of software through changes made in an, apparently, unrelated part of the software. However, testing of ABMs has so far received only very limited attention.

Testing ABMs is not a straightforward task. At least two technical challenges need to be addressed:

  1. Simulations are stochastic processes; different runs will produce different results. The simulation must produce meaningful results not just for one run, but across multiple runs in a statistically significant manner. Understanding how many runs to execute and which parameters to vary across these runs to obtain statistical significance is non-trivial.

  2. Establishing what constitutes successful test runs is, itself, non-trivial: different from typical software unit tests, we are looking for complex, often temporal properties to hold over traces of the states and state changes for large sets of interacting agents. The source in-formation to be evaluated is contained in textual logs from simulation runs, showing information often at levels of granularity different from those required for test evaluation.

Previous work on the SPARTAN [] and MC2MABS [] tools is beginning to address some of these challenges. However, these tools require substantial expertise in ABMs as well as statistics, temporal logic, model checking, and the specifics of encoding a particular domain using particular ABM frameworks (to enable correct interpretation of the log files). This means that these tools are inaccessible to domain experts, leading to a lack of trust in ABMs.

The aim of this PhD project is to develop a domain-specific modelling approach that will allow do-main experts to express the properties to be automatically tested in a language that is close to the problem domain and that can be automatically translated into an automated test suite building on top of SPARTAN and MC2MABS, without exposing end users to the details of implementation of the ABM or the specifics of how simulation runs are encoded in simulation-engine log files. This contributes to safe AI, by making ABMs more reliable. At the same time, it increases trust in the simulation results because the tests executed can be inspected, understood, and manipulated by domain experts instead of only by technical experts.

Project ID