View Related Documents

Abstract

We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of the genomes classified by the phylogeny and N is their total length. We implemented our algorithm and used it to find these discriminating sequences in both small and large phylogenies. We believe our algorithm will have wide applications including: high-throughput classification and identification, oligo array design optimally differentiating genes in gene families, and markers for closely related strains and populations. It will also have scientific significance as a new way to assess the confidence in a given classification.

Fulltext Preview

Image of the first page of the fulltext document