Phylogenetic footprinting is a widely used approach for the prediction of transcription factor binding sites (TFBSs) through
identification of conserved motifs in the upstream sequences of orthologous genes in eukaryotic genomes. However, this popular
strategy may not be directly applicable to prokaryotic genomes, where typically about half of the genes in a genome form multiple-gene
transcription units or operons. The promoter sequences for these operons are located in the inter-operonic rather than inter-genic
regions, which require prediction of TFBSs at the transcriptional unit instead of individual gene level. We have formulated
as a bipartite graph matching problem the identification of conserved operons (including both single-gene and multi-gene operons)
whose individual gene members are orthologous between two genomes and present a graph-theoretic solution. By applying this
method to
Escherichia coli K12 and 11 of its phylogeneticly neighboring species, we have predicted 2,478 sets of conserved operons, and discovered potential
binding motifs for each of these operons. By comparing the prediction results of our approach and other prediction approaches,
we conclude that it is advantageous to use our approach for prediction of
cis regulatory binding sites in prokaryotes. The prediction software package PFP is available at
http://csbl.bmb.uga.edu/~dongsheng/PFP
.