In autonomous and multi-agent systems players are normally assumed rational and cooperating or competing in groups to achieve their overall objectives. Useful methods to study the resulting interactions come from game theory whereby notions such as cooperative games, competitive games, local and global optima, power in coalitions, Nash and other forms of equilibria are studied.
While this has resulted in a wide literature in game theory, two aspects are not discussed in the literature. Firstly, the agents are normally assumed to be defined by discrete (memoryful or memoryless) policies, but do not incorporate notions of continuous decision making, typical in control theory, or hybrid policies, including policies synthesised from data. Secondly, policies are synthesised with respect to a single or global utility function, but do not take it into account considerations of safety and trustworthiness. The object of this project is to lay the foundations for this analysis to be carried out.
Objective: Study a system made of multiple autonomous agents that are governed by neural networks interacting with linear and non-linear environments. Examples of these include applications in several emerging AI application areas, such as those in smart traffic networks, smart/active buildings and smart grids.
Control theory is traditionally focused with the design of such policies, when a centralized utility notion is available and systems dynamics are known to obey prescribed models (linear or nonlinear equations). Game play allows to tackle similar questions when multiple objective functions are introduced together with the strategic agents interactions that these entail. But differently from the state of the art, where these specifications are achieved against objectives by maximising or minimising some utility notion, in this work these will be tackled in conjunction with safety constraints. Crucially, these will be model based and will be made via logic, and verification, or optimisation standpoint.
In addition to this radical paradigm shift, from a technical standpoint, the work will further include the development of robust and theoretically sound parameterizations of dynamical behaviours resulting from past input-output data. These should allow a more flexible deployment of such technologies in contexts where models are hardly available and may be evolving in time.
Requirements:
Expertise in Control and Optimisation is highly beneficial. Some knowledge of machine learning is advantageous.