For 30 years the Lempel–Ziv factorization LZ
x
of a string
x =
x[1..
n] has been a fundamental data structure of string processing, especially valuable for string compression and for computing
all the repetitions (runs) in
x. Traditionally the standard method for computing LZ
x
was based on Θ(
n)-time (or, depending on the measure used,
O(
n log
n)-time) processing of the suffix tree ST
x
of
x. Recently Abouelhoda
et al. proposed an efficient Lempel–Ziv factorization algorithm based on an “enhanced” suffix array – that is, a suffix array SA
x
together with supporting data structures, principally an “interval tree”. In this paper we introduce a collection of fast
space-efficient algorithms for LZ factorization, also based on suffix arrays, that in theory as well as in many practical
circumstances are superior to those previously proposed; one family out of this collection achieves true Θ(
n)-time alphabet-independent processing in the worst case by avoiding tree structures altogether.
Mathematics Subject Classification (2000). 68W05
Keywords. Lempel–Ziv factorization - suffix array - suffix tree - LZ factorization
The work of the first and third authors was supported in part by grants from the Natural Sciences & Engineering Research Council
of Canada.