1

NDFA-based inexact pattern-matching through local optimum

Darius Galis

Abstract:

Pattern-matching techniques are one of the most common ways of identifying the presence of sequences in a data pool, with different types of encodings being possible for different applications in real-world uses. The fields that are employing such techniques range from bioinformatics, forensic analysis, malware detection, to compiler implementation and the matching of text. This presentation emphasizes an innovative approach for inexact pattern-matching using an NDFA automaton, derived and constructed based on the DFA-based Aho-Corasick approach, for tackling trending issues like detecting polymorphic malware behavior and the detection of antimicrobial resistance genes. Our approach focuses on the detection of similar subsequences by employing a diverse set of metrics that compute a local optimum, applied to the existing data with the help of a sliding window concept. We discuss the existing approaches, present our idea for our innovative methodology, as well as compare it with the traditional concepts in real-world use-cases. We also draw conclusions as to potential further applications of our proposed approach in other various research niches and outline the benefits and drawbacks of such approaches.