Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions

Yury LifshitsContact Information, Shay MozesContact Information, Oren WeimannContact Information and Michal Ziv-UkelsonContact Information

(1)  California Institute of Technology, 1200 E. California Blvd., Pasadena, CA 91125, USA
(2)  Department of Computer Science, Brown University, Providence, RI 02912-1910, USA
(3)  MIT Computer Science and Artificial Intelligence Laboratory, 32 Vassar Street, Cambridge, MA 02139, USA
(4)  Computer Science Department, Ben Gurion University of the Negev, Beer-Sheva, 84105, Israel

Received: 10 June 2007  Accepted: 5 November 2007  Published online: 28 November 2007

Abstract   We present a method to speed up the dynamic program algorithms used for solving the HMM decoding and training problems for discrete time-independent HMMs. We discuss the application of our method to Viterbi’s decoding and training algorithms (IEEE Trans. Inform. Theory IT-13:260–269, 1967), as well as to the forward-backward and Baum-Welch (Inequalities 3:1–8, 1972) algorithms. Our approach is based on identifying repeated substrings in the observed input sequence. Initially, we show how to exploit repetitions of all sufficiently small substrings (this is similar to the Four Russians method). Then, we describe four algorithms based alternatively on run length encoding (RLE), Lempel-Ziv (LZ78) parsing, grammar-based compression (SLP), and byte pair encoding (BPE). Compared to Viterbi’s algorithm, we achieve speedups of Θ(log n) using the Four Russians method, $\Omega(\frac{r}{\log r})$ using RLE, $\Omega(\frac{\log n}{k})$ using LZ78, $\Omega(\frac{r}{k})$ using SLP, and Ω(r) using BPE, where k is the number of hidden states, n is the length of the observed sequence and r is its compression ratio (under each compression scheme). Our experimental results demonstrate that our new algorithms are indeed faster in practice. We also discuss a parallel implementation of our algorithms.

Keywords  HMM - Viterbi - Dynamic programming - Compression

A preliminary version of this paper appeared in Proc. 18th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 4–15, 2007.
Y. Lifshits’ research was supported by the Center for the Mathematics of Information and the Lee Center for Advanced Networking.
S. Mozes’ work conducted while visiting MIT.

Contact Information Yury Lifshits
Email: yury@caltech.edu

Contact Information Shay Mozes
Email: shay@cs.brown.edu

Contact Information Oren Weimann
Email: oweimann@mit.edu

Contact Information Michal Ziv-Ukelson (Corresponding author)
Email: michaluz@cs.bgu.ac.il
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this article
Export this article as RIS | Text
 
Remote Address: 38.107.191.113 • Server: mpweb16
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)