We present a probabilistic multiple cause model for the analysis of binary (0–1) data. A distinctive feature of the aspect
Bernoulli (AB) model is its ability to automatically detect and distinguish between “true absences” and “false absences” (both
of which are coded as 0 in the data), and similarly, between “true presences” and “false presences” (both of which are coded
as 1). This is accomplished by specific additive noise components which explicitly account for such non-content bearing causes.
The AB model is thus suitable for noise removal and data explanatory purposes, including omission/addition detection. An important
application of AB that we demonstrate is data-driven reasoning about palaeontological recordings. Additionally, results on
recovering corrupted handwritten digit images and expanding short text documents are also given, and comparisons to other
methods are demonstrated and discussed.
Keywords Data mining - Probabilistic latent variable models - Multiple cause models - 0–1 data
A part of the work of Ella Bingham was performed while visiting the School of Computer Science, University of Birmingham,
UK.