Volume 16, Number 3, 229-247, DOI: 10.1007/s10878-007-9129-6

A new recombination lower bound and the minimum perfect phylogenetic forest problem

Yufeng Wu and Dan Gusfield

From the issue entitled "Special Issue: Selected Papers from COCOON 2007; Guest Editors: Guohui Lin and Zhipeng Cai"

View Related Documents

Abstract

Understanding recombination is a central problem in population genetics. In this paper, we address an established computational problem in this area: compute lower bounds on the minimum number of historical recombinations for generating a set of sequences (Hudson and Kaplan in Genetics 111, 147–164, 1985; Myers and Griffiths in Genetics 163, 375–394, 2003; Gusfield et al. in Discrete Appl. Math. 155, 806–830, 2007; Bafna and Bansal in IEEE/ACM Trans. Comput. Biol. Bioinf. 1, 78–90, 2004 and in J. Comput. Biol. 13, 501–521, 2006; Song et al. in Bioinformatics 421, i413–i244, 2005). In particular, we propose a new recombination lower bound: the forest bound. We show that the forest bound can be formulated as the minimum perfect phylogenetic forest problem, a natural extension to the classic binary perfect phylogeny problem, which may be of interests on its own. We then show that the forest bound is provably higher than the optimal haplotype bound (Myers and Griffiths in Genetics 163, 375–394, 2003), a very good lower bound in practice (Song et al. in Bioinformatics 421, i413–i422, 2005). We prove that, like several other lower bounds (Bafna and Bansal in J. Comput. Biol. 13, 501–521, 2006), computing the forest bound is NP-hard. Finally, we describe an integer linear programming (ILP) formulation that computes the forest bound precisely for certain range of data. Simulation results show that the forest bound may be useful in computing lower bounds for low quality data.

Keywords  Recombination - Lower bound on the minimum number of recombination - Ancestral recombination graph - Population genetics - Computational complexity

A preliminary version of this paper appeared in the Proceedings of COCOON 2007, LNCS, vol. 4598, pp. 16–26.
The work was performed while Y. Wu was with UC Davis and supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation.
D. Gusfield supported by grants CCF-0515278 and IIS-0513910 from National Science Foundation.

Fulltext Preview

Image of the first page of the fulltext document