View Related Documents

Abstract

An important issue in data mining concerns the discovery of patterns presenting a user-speciffied minimum support. We generalize this problematics by introducing the concept of ambiguous event. An ambiguous event can be substituated for another without modifying the substance of the concerned pattern. For instance, in molecular biology, researchers attempt to identify conserved patterns in a family of proteins for which they know that they have evolved from a common ancestor. Such patterns are flexible in the sense that some residues may have been substituated for others during evolution. A[B C] is an example of notation of an ambiguous pattern representing the event A, followed by either the event B or C. A new scoring scheme is proposed for the computation of the frequency of ambiguous patterns, based on substitution matrices. A substitution matrix expresses the probability of the replacement of an event by another. We propose to adapt the Winepi algorithm [1] to ambiguous events. Finally, we give an application to the discovery of conserved patterns in a particular family of proteins, the cytokine receptors.

Fulltext Preview

Image of the first page of the fulltext document