Understanding Distribution Shift with Logic-based Reasoning and Verification

Data-driven approaches have been proven powerful in a variety of domains, from computer vision to NLP. However, in some domains – such as in attack detection in security –  the arms race between attackers and defenders causes an ever-growing distribution shift in the main characteristics of the detection task. However, we do not yet have a good theoretical understaniding of the reasons and definitions behind distribution shift, and even what are the root causes and effects of such a drift.

This project will define new symbolic AI framework based on logic-based reasoning  to devise techniques for understanding the root causes and effects of distribution shift under different assumptions and scenarios [b,c,d].

One possible approach is to extend the propositional logic for streaming data with sliding windows originally proposed by LARS [e]. Using propositional logic would allow for defining modifications to a certain abstract representation (e.g., mutation in a software abstraction) that could entail a particular type of distribution shift (e.g., co-variate shift, label shift, or concept shift [b]). Defining such a knowledge base of logic statements would then allow to create a knowledge base on which to perform queries to understand better the causes and effects of distribution shift, and even determining what type of drift is determined by certain pre-conditions. This approach could later be extended with probabilistic logic frameworks [c] and bayesian approaches [f] for uncertainty reasoning to get closer to realistic scenarios in which some information may only be speculated with a certain probability.

The final objective of this symbolic AI framework is to gather a deeper understanding of the distribution shift phenomenon from a model-driven perspective, its root causes and its effects, as well as understanding logic-based constraints that could be later embedded in data-driven algorithms to improve their resilience against distribution shift.

[a] Pendlebury, Feargus, et al. “TESSERACT: Eliminating experimental bias in malware classification across space and time.” _USENIX Security Symposium_, 2019.
[b] Moreno-Torres, Jose G., et al. “A unifying view on dataset shift in classification.” _Pattern recognition_, 2012
[c] Jajodia, Sushil, et al. “A probabilistic logic of cyber deception.” _IEEE Transactions on Information Forensics and Security_, 2017.
[d] Poon, Hoifung, and Pedro Domingos. “Sound and efficient inference with probabilistic and deterministic dependencies.” _AAAI_, 2006.
[e] Beck, Harald, Minh Dao-Tran, and Thomas Eiter. “LARS: A logic-based framework for analytic reasoning over streams.” _Artificial Intelligence_,2018.
[f] De Campos, Luis M., Juan F. Huete, and Serafin Moral. “Probability intervals: a tool for uncertain reasoning.” _International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems_, 1994

Project ID

STAI-CDT-2023-KCL-17

Supervisor

Fabio Pierazzihttps://www.kcl.ac.uk/people/fabio-pierazzi

Category

Logic, Reasoning