View Related Documents

Abstract

We consider exact string searching in compressed texts. We utilize a semi-static compression scheme, where characters of the text are encoded as variable-length sequences of base symbols, each of which is represented by a fixed number of bits. In addition, we split the symbols into two parallel files in order to allow faster access. Our searching algorithm is a modification of the Boyer-Moore-Horspool algorithm. Our approach is practical and enables faster searching of string patterns than earlier character-based compression models and the best Boyer-Moore variants in uncompressed texts.
This work has been supported by the National Technology Agency (Tekes).

Fulltext Preview

Image of the first page of the fulltext document