Efficient Computation of Long Similar Subsequences
Abdullah N. Arslan6
and Ömer Eğecioğlu6 
| (6) |
Department of Computer Science, University of California, Santa Barbara, Santa Barbara, CA 93106, USA |
Abstract
Given sequences X of length n and Y of length m with n ≥ m, let LAt* and NLAt* denote the maximum ordinary, and maximum length normalized scores of local alignments with length at least a given threshold value t. The alignment length is defined as the sum of the lengths of the involved subsequences, and length normalized score of an
alignment is the quotient of the ordinary score by the alignment length. We develop an algorithm which finds an alignment
with ordinary score ≥ LAt*, and length ≥ (1 - 1/r)t for a given r, in time O(rnm) and space O(rm). The algorithm can be used to find an alignment with length normalized score > λ for a given positive λ with the same time
and space complexity and within the same approximation bounds. Thus this algorithm provides a length-approximate answer to
a query such as “Do X and Y share a (sufficiently long) fragment with more than 70% of similarity?” We also show that our approach gives improved approximation
algorithms for the normalized local alignment problem. In this case we can efficiently find an alignment with length ≥ (1 - 1/r)t which has a length normalized score ≥ NLAt*.
Keywords Local alignment - normalized local alignment - approximation algorithm - dynamic programming - ratio maximization
Supported in part by NSF Grants No. CCR-9821038 and No. EIA-9818320.
References secured to subscribers.