Volume 37, Number 2, 427-433, DOI: 10.1007/s00726-008-0172-0

A complexity-based method for predicting protein subcellular location

Xiaoqi Zheng, Taigang Liu and Jun Wang

View Related Documents

Abstract

A complexity-based approach is proposed to predict subcellular location of proteins. Instead of extracting features from protein sequences as done previously, our approach is based on a complexity decomposition of symbol sequences. In the first step, distance between each pair of protein sequences is evaluated by the conditional complexity of one sequence given the other. Subcellular location of a protein is then determined using the k-nearest neighbor algorithm. Using three widely used data sets created by Reinhardt and Hubbard, Park and Kanehisa, and Gardy et al., our approach shows an improvement in prediction accuracy over those based on the amino acid composition and Markov model of protein sequences.

Keywords  Protein subcellular location - Symbol sequence complexity -  k-Nearest neighbor algorithm - Jackknife analysis

Fulltext Preview

Image of the first page of the fulltext document