Lecture Notes in Computer Science, 2003, Volume 2676/2003, 129-143, DOI: 10.1007/3-540-44888-8_10

Optimal Partitions of Strings: A New Class of Burrows-Wheeler Compression Algorithms

Raffaele Giancarlo and Marinella Sciortino

View Related Documents

Abstract

The Burrows-Wheeler transform [1] is one of the mainstays of lossless data compression. In most cases, its output is fed to Move to Front or other variations of symbol ranking compression. One of the main open problems [2] is to establish whether Move to Front, or more in general symbol ranking compression, is an essential part of the compression process. We settle this question positively by providing a new class of Burrows-Wheeler algorithms that use optimal partitions of strings, rather than symbol ranking, for the additional step. Our technique is a quite surprising specialization to strings of partitioning techniques devised by Buchsbaum et al. [3] for two-dimensional table compression. Following Manzini [4], we analyze two algorithms in the new class, in terms of the k-th order empirical entropy of a string and, for both algorithms, we obtain better compression guarantees than the ones reported in [4] for Burrows-Wheeler algorithms that use Move to Front.
Both authors are partially supported by Italian MURST Project of National Relevance “Bioinformatica e Ricerca Genomica”. Additional support is provided to the first author by FIRB Project “Bioinformatica per la Genomica e la Proteomica” and to the second author by Italian MURST Project of National Relevance “Linguaggi Formali ed Automi: Teoria e Applicazioni”.

Fulltext Preview

Image of the first page of the fulltext document