Biological macromolecules, i.e. DNA, RNA and proteins, are coded by strings, called primary structures. During the last decades, the number and the complexity of primary structures are growing exponentially. Analyzing this huge volume of data to extract pertinent knowledge is a challenging task. Data mining approaches can be helpful to reach this goal. In this paper, we present a new data mining approach, called
Disclass, based on vote strategies to do classification of primary structures: Let
f1,
f2,...,
fn be families that represent, respectively,
n samples of
n sets
S1,
S2,...,
Sn of primary structures. Let us consider now a new primary structure
w that is assumed to belong to one of the
n sets
S1,
S2,...,
Sn. By using our data mining approach Disclass, the decision to assign the new primary structure
w to one of the sets
S1,
S2,...,
Sn is taken as follows: (i) During the first step, for each family
fi, 1
i
n, we construct the
ambiguously discriminant and minimal substrings (ADMS) associated with this family. Because the family
fi, 1
i
n, is a sample of the set
Si, the obtained ADMS are considered also to be associated with the whole set
Si. During the classification process, the ADMS associated with the set
Si, that are
approximate substrings of the new primary structure
w, will vote with weighted voices for the set
Si. (ii) During the second step, we compute according to a
vote strategy, the voice weights of the different ADMS, constructed during the first step. (iii) Finally, during the last step, the set that has the maximum weight of voices is the set to which we assign the new primary structure
w.