Lecture Notes in Computer Science, 1994, Volume 807/1994, 164-172, DOI: 10.1007/3-540-58094-8_15

Shortest common superstrings for strings of random letters

Kenneth S. Alexander

View Related Documents

Abstract

Given a finite collection of strings of letters from a fixed alphabet, it is of interest, in the contexts of data compression and DNA sequencing, to find the length of the shortest string which contains each of the given strings as a consecutive substring. In order to analyze the average behavior of the optimal superstring length, substrings with a specified collection of lengths are considered with the letters selected independently at random. An asymptotic expression, as the collection of lengths becomes large, is obtained for the savings from compression, that is, the difference between the uncompressed (concatenated) length and the optimal superstring length.
Research supported by NSF grant DMS-9206139
The author would like to thank M. Waterman and G. Benson for helpful discussions; M. Waterman in particular suggested considering problems of this type.

Fulltext Preview

Image of the first page of the fulltext document