In the
l -phylogeny problem, one wishes to construct an evolutionary tree for a set of species represented by characters, in which
each state of each character induces no more than
l connected components. We consider the fixed-topology version of this problem for fixed-topologies of arbitrary degree. This
version of the problem is known to be
NP -complete for
l ≥ 3 even for degree-
3 trees in which no state labels more than
l+1 leaves (and therefore there is a trivial
l + 1 phylogeny). We give a
2 -approximation algorithm for all
l ≥ 3 for arbitrary input topologies and we give an optimal approximation algorithm that constructs a
4 -phylogeny when a
3 -phylogeny exists. Dynamic programming techniques, which are typically used in fixed-topology problems, cannot be applied
to
l -phylogeny problems. Our
2 -approximation algorithm is the first application of linear programming to approximation algorithms for phylogeny problems.
We extend our results to a related problem in which characters are polymorphic.
Key words. Phylogeny, Approximation algorithm, Computational biology.
Received June 9, 1997; revised March 13, 1998.