View Related Documents

Abstract

Measures relating word frequencies and expectations have been constantly of interest in Bioinformatics studies. With sequence data becoming massively available, exhaustive enumeration of such measures have become conceivalbe, and yet pose significant computational burdeneven when limited to words of bounded maximum length. In addition, the display of the huge tables possibly resulting from these counts poses practical problems of visualization and inference.
Verbumculus is a suite of software tools for the efficient and fast detection of over- or under-represented words in nucleotide sequences. The inner core ofVerbumculus rests on subtly interwoven properties of statistics, pattern matching and combinatories on words, that enable one to limit drastically anda priori the set of over-or under-represented candidate words of all lengths in a given sequence, thereby rendering it more feasible both to detect and visualize such words in a fast and practically useful way. This paper is devoted to the description of the facility at the outset and to report experimental results, ranging from simulations on synthetic data to the discovery of regulatory elements on the upstream regions of a set of genes of the yeast.
The softwareVerbumculus is accessible at http://www.cs.ucr.edu/~stelo/Verbumculus/or http://wwwdbl. dei.unipd.it/Verbumculus/

Keywords  Verbumeulus - unusual words - subword statistics - pattern discovery - regulatory elements - suffix trees

Supported in part by the NSF of U.S.A. (Grant No. CCR-9700276), Purdue Research Foundation (Grant No. 690-1398-3145), the Italian Ministry of University and Research, and the Research Program of the University of Padova.
Supported by Purdue Research Foundation (Grant No.690-1398-3145), the Italian Ministry of University and Research, the Research Program of the University of Padova and Bourns College of Engineering, University of California, Riverside.
Alberto Apostolico (Dr. Eng., 1973, Univ. Naples) is a professor of computer engineering at Univ. Padova and professor of computer sciences at Purdue University. He is a fulbright scholar in 1974–75 at CMU, held visiting and permanent positions in the U.S. (UIUC, Rensselaer, Purdue, IBM) and Europe (U. of Salerno, U. of L' Aquila, LASI, U. of Paris, U. of London, King's Zif-Bielefeld, Renyi-Hungarian Acad of Science), and a full prof. in Italy since 1987, at DEI since 1992. His research interests are algorithmic analysis and design, with emphasis on pattern matching, on which subject he has authored more than 100 papers, and co-authored/edited 7 volumes. He serves on the Editorial Boards of Theor. Comp. Sci. Par. Proc. Let., J. of Comp. Biol., Chaos Th. and Appl., Springer Lecture Notes in Bioinformatics, Algorithmica (g.e.), Keynote at over 60, PC Member for over 50 international conferences He has been a reviewer for NSF, Canadian SERC, NATO, HSFP, Finland Acad Sci, Hong Kong and Israel Science Councils. He is a current or past member of ACM AICA, EATCS, IEEE. He has been (co-) recipient of U.S. (NSF, AFOSR, NIH) French, British, Italian (CNR, MURST, MIUR) and international (Fulbright, NATO, ESPRIT) grants, and of au IBM Faculty Award in 2002.
Fang-Cheng Gong (Ph.D 1995, Dept. Plant Sciences, Univ. Arizona) has held positions of graduate research associate at Dept. Plant Sciences, Univ. Arizona, Postdoc Fellow at the Dept. Plant Pathology of Univ. Arizona, Postdoc Fellow at, Dept. Biological Sciences, Purdue University, before joining the research staff of Celera Genomics. His research interests include plant molecular biology, microbial molecular genetics, and plant cell biology.
Stefano Lonardi (Ph.D., 2001, Purdue University) is an assistant professor of computer science & engineering at the Univ. California, Riverside. His research is currently focused on bioinformatics, data compression, information hiding, and data mining. He received his “Laurea” degree from Univ. Pisa in 1994. and his Ph.D. degree in computer science, from Purdue University. He also holds a Research Doctorate from the Univ. Padua (1999). He is a member of ACM, IEEE, Upsilon Pi Upsilon and Phi Kappa Phi honor societies, and the International Society for Computational Biology.

Fulltext Preview

Image of the first page of the fulltext document