String comparison is a fundamental problem in computer science, with applications in areas such as computational biology,
text processing or compression. In this paper we address the minimum common string partition problem, a string comparison
problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement.
A
partition of a string
A is a sequence
P=(P1,P2,...Pm){\mathcal P}=(P_{1},P_{2},...P_{m}) of strings, called the
blocks, whose concatenation is equal to
A. Given a partition
P{\mathcal P} of a string
A and a partition
Q{\mathcal Q} of a string
B, we say that the pair
áP,Qñ\langle\mathcal{P,Q}\rangle is a
common partition of
A and
B if
Q{\mathcal Q} is a permutation of
P{\mathcal P}. The
minimum common string partition problem (
MCSP) is to find a common partition of two strings
A and
B with the minimum number of blocks. The restricted version of
MCSP where each letter occurs at most
k times in each input string, is denoted by
k-
MCSP.
In this paper, we show that 2-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a 1.1037-approximation for 2-MCSP and a linear time 4-approximation algorithm for 3-MCSP. We are not aware of any better approximations.