This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers
using a standard boosting algorithm. It permits the processing of large datasets even if the underlying base learning algorithm
cannot efficiently do so. The basic idea is to split incoming data into chunks and build a committee based on classifiers
built from these individual chunks. Our method extends earlier work by introducing a method for adaptively pruning the committee.
This is essential when applying the algorithm in practice because it dramatically reduces the algorithm’s running time and
memory consumption. It also makes it possible to efficiently “race” committees corresponding to different chunk sizes. This
is important because our empirical results show that the accuracy of the resulting committee can vary significantly with the
chunk size. They also show that pruning is indeed crucial to make the method practical for large datasets in terms of running
time and memory requirements. Surprisingly, the results demonstrate that pruning can also improve accuracy.