Lecture Notes in Computer Science, 2007, Volume 4366/2007, 107-118, DOI: 10.1007/978-3-540-71037-0_7

Enhancing Coding Potential Prediction for Short Sequences Using Complementary Sequence Features and Feature Selection

Yvan Saeys and Yves Van de Peer

View Related Documents

Abstract

The identification of coding potential in DNA sequences is of major importance in bioinformatics, where it is often used to assist expert systems that automatically try to recognize genes in genomes. For longer sequences, the identification of coding potential tends to be easier due to a better signal-to-noise ratio, whereas for very short sequences the issue becomes more problematic. In this paper, we present new methods that specifically aim at a better prediction of coding potential in short sequences. To this end, we combine different, complementary sequence features together with a feature selection strategy. Results comparing the new classifiers to state of the art models show that our new approach significantly outperforms the existing methods when applied to short sequences.

Fulltext Preview

Image of the first page of the fulltext document