View Related Documents

Abstract

We describe a Bayesian inference method for the identification of protein coding regions (active or residual) in DNA or RNA sequences. Its main feature is the computation of the conditional and a priori probabilities required in Bayes’s formula by factoring each event (possible annotation) for a nucleotide string into the concatenation of shorter events, believed to be independent.The factoring allows us to obtain fast but reliable estimates for these parameters from readily available databases; whereas the probability estimation for unfactored events would require databases and tables of astronomical size. Promising results were obtained in tests with natural and artificial genomes.

Keywords  coding regions - ab-initio DNA tagging - Bayesian inference

Fulltext Preview

Image of the first page of the fulltext document