View Related Documents

Abstract

Searching for the longest common substring (LCS) of biosequences is one of the most important tasks in Bioinformatics. A fast algorithm for LCS problem named FAST_LCS is presented. The algorithm first seeks the successors of the initial identical character pairs according to a successor table to obtain all the identical pairs and their levels. Then by tracing back from the identical character pair at the largest level, the result of LCS can be obtained. For two sequences X and Y with lengths n and m, the memory required for FAST_LCS is max{8*(n+1)*8*(m*1),L}, here L is the number of identical character pairs and time complexity of parallel implementation is O(|LCS(X,Y)|), here, |LCS(X,Y)| is the length of the LCS of X,Y. Experimental result on the gene sequences of tigr database using MPP parallel computer Shenteng 1800 shows that our algorithm can get exact correct result and is faster and more efficient than other LCS algorithms.

Keywords  bioinformatics - longest common subsequence - identical character pair

Fulltext Preview

Image of the first page of the fulltext document