Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a
scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment
so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot
fully exploit interdependency among features and thus, limits alignment accuracy.
This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different
protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling
the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the
gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression
trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results
indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy
and fold recognition rate greatly.
Keywords protein threading - conditional random fields - gradient tree boosting - regression tree - nonlinear scoring function