Skip to main content

An Optimized Pattern Matching Algorithm Based on Suffix Arrays with LCP-Guided Binary Search: A Time-Efficient Approach for Large-Scale Text Processing

Research Abstract

Pattern matching is a fundamental operation in computational biology, information retrieval, and text processing. Despite advances in algorithms, efficiently searching large-scale texts, particularly genomic sequences spanning billions of base pairs, remains a significant challenge. Existing approaches face substantial limitations including Boyer-Moore algorithm's poor worst-case performance of O(mn) and limited scalability on large datasets, KMP algorithm's high memory overhead and poor cache locality despite O(m+n) complexity, traditional suffix array methods' expensive O(mlogn) search time with costly preprocessing, and hash-based approaches like Rabin-Karp's vulnerability to hash collisions and poor performance on repetitive sequences common in genomic data. This paper presents LCP-Optimized Suffix Array (LOSA), a high-performance pattern matching algorithm that combines suffix arrays with Longest …

Research Authors
Mohmoud K Diab, Taysir Hassan A Soliman, Amr M Mohamed, Ibrahim E Elsemman
Research Date
Research Department
Research Publisher
IEEE
Research Year
2025