The increasing number of sequenced genomes motivates the use of evolutionary patterns to detect genes. We present a series
of comparative methods for gene finding in homologous prokaryotic or eukaryotic sequences. Based on a model of legal genes
and a similarity measure between genes, we find the pair of legal genes of maximum similarity. We develop methods based on
genes models and alignment based similarity measures of increasing complexity, which take into account many details of real
gene structures, e.g. the similarity of the proteins encoded by the exons. When using a similarity measure based on an exiting
alignment, the methods run in linear time. When integrating the alignment and prediction process which allows for more fine
grained similarity measures, the methods run in quadratic time. We evaluate the methods in a series of experiments on synthetic
and real sequence data, which show that all methods are competitive but that taking the similarity of the encoded proteins
into account really boost the performance.
Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).
Bioinformatics Research Center (BiRC), www.birc.dk, funded by Aarhus University Research Fundation.