Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Optimal Exact String Matching Based on Suffix Arrays

Mohamed Ibrahim AbouelhodaContact Information, Enno OhlebuschContact Information and Stefan KurtzContact Information

(6)  Faculty of Technology, University of Bielefeld, P.O. Box 10 01 31, 33501 Bielefeld, Germany
Abstract
Using the suffix tree of a string S, decision queries of the type “Is P a substring of S?” can be answered in O(|P|) time and enumeration queries of the type “Where are all z occurrences of P in S?” can be answered in O(|P|+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. The suffix array is a more space economical index structure. Using it and an additional table, Manber and Myers (1993) showed that decision queries and enumeration queries can be answered in O(|P|+log|S|) and O(|P|+log|S|+z) time, respectively, but no optimal time algorithms are known. In this paper, we show how to achieve the optimal O(|P|) and O(|P| + z) time bounds for the suffix array. Our approach is not confined to exact pattern matching. In fact, it can be used to efficiently solve all problems that are usually solved by a top-down traversal of the suffix tree. Experiments show that our method is not only of theoretical interest but also of practical relevance.

Contact Information Mohamed Ibrahim Abouelhoda
Email: mibrahim@TechFak.Uni-Bielefeld.DE

Contact Information Enno Ohlebusch
Email: enno@TechFak.Uni-Bielefeld.DE

Contact Information Stefan Kurtz
Email: kurtz@TechFak.Uni-Bielefeld.DE
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Referenced by
1 newer article

  1. Maaß, Moritz G. (2006) Matching statistics: efficient computation and a new practical algorithm for the multiple common substring problem. Software Practice and Experience 36(3)
    [CrossRef]
Remote Address: 38.107.191.108 • Server: mpweb24
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)