Trustworthy AI for DNA Sequencing

DNA sequencing is becoming ever more important for medical applications, be it for predictive medicine or precision/personalised medicine. At the same time, DNA sequencing is starting to use AI to map signals (from the sequencer) to nucleotides/bases or to the whole sequence. This is very powerful and makes sequencing cheaper and faster. However, as with a lot of AI, it is not entirely clear how a result is obtained, i.e. in this case how series of signals is mapped to a sequence, after the fact. For applications as crucial as precision medicine, this is untenable, not least because mapping it wrongly, may lead to a different diagnosis which may lead to the wrong treatment (with adverse impact on the patient and increased medical cost).

The goal of this project is it thus to formulate rules, the logic to implement the mapping between signals of the sequencing hardware and nucleotides using rule-based logic. Further, we want to implement the means to track the provenance of the mapping to ultimately enable trust in the sequencing process such that if the mapping is wrong, the process can be improved/fixed. We will use as a starting point recent work (Provenance for Probabilistic Logic Programs at EDBT) and develop our own provenance model to precisely track what data and what rules lead to a particular result.

Having implemented the mapping between signal and nucleotide based on logic-based rules and incorporating provenance, we can exactly track back what rules and input signal have been used to infer a nucleotide in case an error occurs. The resulting provenance can also help to find bugs and errors in the mapping process and be used to iteratively improve the mapping.

This traceability/provenance will ultimately enable trust in the sequencing result and through that, in a rather broad sense, enable safety in the context of the personalised treatment chosen. The project builds on Oxford Nanopore software (as well as their hardware but we’ll leave that unchanged) and will replace the ML models that currently do the mapping with rule-based logic to enable trust.

Project ID

STAI-CDT-2021-IC-16

Supervisor