Data-driven approaches have been proven powerful in a variety of domains, from computer vision to NLP. However, in some domains – such as in attack detection in security – the arms race between attackers and defenders causes an ever-growing distribution shift in the main characteristics of the detection task. However, we do not yet have a good theoretical understaniding of the reasons and definitions behind distribution shift, and even what are the root causes and effects of such a drift.
This project will define new symbolic AI framework based on logic-based reasoning to devise techniques for understanding the root causes and effects of distribution shift under different assumptions and scenarios [b,c,d].
One possible approach is to extend the propositional logic for streaming data with sliding windows originally proposed by LARS [e]. Using propositional logic would allow for defining modifications to a certain abstract representation (e.g., mutation in a software abstraction) that could entail a particular type of distribution shift (e.g., co-variate shift, label shift, or concept shift [b]). Defining such a knowledge base of logic statements would then allow to create a knowledge base on which to perform queries to understand better the causes and effects of distribution shift, and even determining what type of drift is determined by certain pre-conditions. This approach could later be extended with probabilistic logic frameworks [c] and bayesian approaches [f] for uncertainty reasoning to get closer to realistic scenarios in which some information may only be speculated with a certain probability.
The final objective of this symbolic AI framework is to gather a deeper understanding of the distribution shift phenomenon from a model-driven perspective, its root causes and its effects, as well as understanding logic-based constraints that could be later embedded in data-driven algorithms to improve their resilience against distribution shift.