Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Statistical Identification of Uniformly Mutated Segments within Repeats
| |
|
Statistical Identification of Uniformly Mutated Segments within Repeats
S. Cenk Ṣahinalp6 , Evan Eichler7 , Paul Goldberg8 , Petra Berenbrink9 , Tom Friedetzky10 and Funda Ergun11 
| (6) |
Dept of EECS, Dept of Genetics and Center for Computational Genomics, CWRU, USA |
| (7) |
Dept of Genetics and Center for Computational Genomics, CWRU, USA |
| (8) |
Dept of Computer Science, University of Warwick, UK |
| (9) |
School of Computing, Simon Fraser University, Canada |
| (10) |
Pacific Institute of Mathematics, Simon Fraser University, Canada |
| (11) |
NEC Research Institute and Dept of EECS, CWRU, USA |
Abstract
Given a long string of characters from a constant size (w.l.o.g. binary) alphabet we present an algorithm to determine whether
its characters have been generated by a single i.i.d. random source. More specifically, consider all possible k-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the k coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability
of a coin change is much lower than the probability of using the same coin repeatedly. We present a statistical test procedure
which, for any given S, determines whether the a posteriori probability for k = 1 is higher than for any other k > 1. Our algorithm runs in time O( l
4 log l), where l is the length of S, through a dynamic programming approach which exploits the convexity of the a posteriori probability for k.
The problem we consider arises from two critical applications in analyzing long alignments between pairs of genomic sequences.
A high alignment score between two DNA sequences usually indicates an evolutionary relationship, i.e. that the sequences have
been generated as a result of one or more copy events followed by random point mutations. Such sequences may include functional
regions (e.g. exons) as well as nonfunctional ones (e.g. introns). Functional regions with critical importance exhibit much
lower mutation rates than non-functional DNA (or DNA
Supported in part by an NSF Career Award and by Charles B. Wang Foundation.
Partially supported by the IST Programme of the EU under contract number IST-1999-14186 (ALCOM-FT).
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|