It is easy to design on-line learning algorithms for learning k out of n variable monotone disjunctions by simply keeping one weight per disjunction. Such algorithms use roughly O(n
k) weights which can be prohibitively expensive. Surprisingly, algorithms like Winnow require only n weights (one per variable) and the mistake bound of these algorithms is not too much worse than the mistake bound of the
more costly algorithms. The purpose of this paper is to investigate how the exponentially many weights can be collapsed into
only O(n) weights. In particular, we consider probabilistic assumptions that enable the Bayes optimal algorithm’s posterior over the
disjunctions to be encoded with only O(n) weights. This results in a new O(n) algorithm for learning disjunctions which is related to the Bylander’s BEG algorithm originally introduced for linear regression.
Beside providing a Bayesian interpretation for this new algorithm, we are also able to obtain mistake bounds for the noise
free case resembling those that have been derived for the Winnow algorithm. The same techniques used to derive this new algorithm
also provide a Bayesian interpretation for a normalized version of Winnow.
The first and third authors are supported by NSF grant CCR 9700201. The second author is supported by a research fellowship
from the University of Milan and by a Eurocolt grant.