An Optimized Pattern Matching Algorithm based on Suffix Arrays with LCP-Guided Binary Search: A Time-Efficient Approach for Large-Scale Text Processing

Research Abstract

Pattern matching is a fundamental operation in
computational biology, information retrieval, and text processing.
Despite advances in algorithms, efficiently searching large-scale
texts, particularly genomic sequences spanning billions of base
pairs, remains a significant challenge. Existing approaches face
substantial limitations including Boyer-Moore algorithm’s poor
worst-case performance of O(mn) and limited scalability on large
datasets, KMP algorithm’s high memory overhead and poor
cache locality despite O(m+n) complexity, traditional suffix array
methods’ expensive O(mlogn) search time with costly preprocessing,
and hash-based approaches like Rabin-Karp’s vulnerability
to hash collisions and poor performance on repetitive sequences
common in genomic data.
This paper presents LCP-Optimized Suffix Array (LOSA),
a high-performance pattern matching algorithm that combines
suffix arrays with Longest Common Prefix (LCP) array optimizations
and parallel processing to achieve faster search times
with scalable memory usage. LOSA reduces the worst-case search
complexity from O(m log n) to nearly O(m + log n), where
m length pattern and n length the text. This improvement
is achieved through LCP-aware skips to minimize redundant
comparisons, as well as parallelized construction and query
phases, for modern multicore systems.

Research Authors

Mohmoud K Diab, Taysir Hassan A Soliman, Amr M Mohamed, Ibrahim E Elsemman

Research Date

Tue, 21 Oct 2025 - 12:00

Research Department

Information Systems Department

Research Journal

IEEE Xplore

Research Member

Mahmoud Kotb abdo Diab

Taiseer Hassan Abdel-Hamid Sulaiman

Ibrahim E. Elsemman

Research Publisher

IEEE

Research Year

2025

Faculty of Computers and Information

An Optimized Pattern Matching Algorithm based on Suffix Arrays with LCP-Guided Binary Search: A Time-Efficient Approach for Large-Scale Text Processing

آخر الأبحاث

Assiut
University

Important Links

Our Address

An Optimized Pattern Matching Algorithm based on Suffix Arrays with LCP-Guided Binary Search: A Time-Efficient Approach for Large-Scale Text Processing

آخر الأبحاث

Assiut University

Important Links

Our Address

Assiut
University