Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Statistical Identification of Uniformly Mutated Segments within Repeats

S. Cenk ṢahinalpContact Information, Evan EichlerContact Information, Paul GoldbergContact Information, Petra BerenbrinkContact Information, Tom Friedetzky10 Contact Information and Funda Ergun11 Contact Information

(6)  Dept of EECS, Dept of Genetics and Center for Computational Genomics, CWRU, USA
(7)  Dept of Genetics and Center for Computational Genomics, CWRU, USA
(8)  Dept of Computer Science, University of Warwick, UK
(9)  School of Computing, Simon Fraser University, Canada
(10)  Pacific Institute of Mathematics, Simon Fraser University, Canada
(11)  NEC Research Institute and Dept of EECS, CWRU, USA
Abstract
Given a long string of characters from a constant size (w.l.o.g. binary) alphabet we present an algorithm to determine whether its characters have been generated by a single i.i.d. random source. More specifically, consider all possible k-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the k coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability of a coin change is much lower than the probability of using the same coin repeatedly. We present a statistical test procedure which, for any given S, determines whether the a posteriori probability for k = 1 is higher than for any other k > 1. Our algorithm runs in time O(l 4 log l), where l is the length of S, through a dynamic programming approach which exploits the convexity of the a posteriori probability for k.
The problem we consider arises from two critical applications in analyzing long alignments between pairs of genomic sequences. A high alignment score between two DNA sequences usually indicates an evolutionary relationship, i.e. that the sequences have been generated as a result of one or more copy events followed by random point mutations. Such sequences may include functional regions (e.g. exons) as well as nonfunctional ones (e.g. introns). Functional regions with critical importance exhibit much lower mutation rates than non-functional DNA (or DNA
Supported in part by an NSF Career Award and by Charles B. Wang Foundation.
Partially supported by the IST Programme of the EU under contract number IST-1999-14186 (ALCOM-FT).

Contact Information S. Cenk Ṣahinalp
Email: cenk@cwru.edu

Contact Information Evan Eichler
Email: eee@po.cwru.edu

Contact Information Paul Goldberg
Email: pwg@dcs.warwick.ac.uk

Contact Information Petra Berenbrink
Email: petra@cs.sfu.ca

Contact Information Tom Friedetzky
Email: tf@pims.math.ca

Contact Information Funda Ergun
Email: ergun@research.nj.nec.com
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.109 • Server: mpweb23
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)