One component of the genomic program controlling the transcriptional regulation of genes are the locations and arrangement
of transcription factors bound to the promoter and enhancer regions of a gene. Because the genomic locations of the functional
binding sites of most transcription factors is not yet known, predicting them is of great importance. Unfortunately, it is
well known that the low specificity of the binding of transcription factors to DNA makes such prediction, using position-specific
probability matrices (motifs) alone, subject to huge numbers of false positives. One approach to alleviating this problem
has been to use phylogenetic “shadowing” or “footprinting” to remove unconserved regions of the genome from consideration.
Another approach has been to combine a phylogenetic model and the site-specificity model into a single, predictive model of
conserved binding sites. Both of these approaches are based on alignments of orthologous genomic regions from two or more
species. In this work, we use a simplified, theoretical model to study the statistical power of the later approach to the
prediction of features such as transcription factor binding sites. We investigate the question of the number of genomes required
at varying evolutionary distances to achieve specified levels of accuracy (false positive and false negative prediction rates).
We show that this depends strongly on the information content of the position-specific probability matrix and on the evolutionary
model. We explore the effects of modifying the structure of the phylogenetic model, and conclude that placing the target genome
at the root of the tree has a negligible effect on the power predicted by the model. Hence, as it is much easier to calculate,
we can use this as an approximation to phylogenetic motif scanning using real trees. Finally we perform an empirical study
and demonstrate that the performance of current phylogenetic motif scanning programs is far from the theoretical limit of
their power, leaving ample room for improvement.