We describe a real-time computer vision and machine learning system for modeling and recognizing human behaviors in a visual
surveillance task. The system is particularly concerned with detecting when interactions between people occur, and classifying
the type of interaction. Examples of interesting interaction behaviors include following another person, altering one's path
to meet another, and so forth. Our system combines top-down with bottom-up information in a closed feedback loop, with both
components employing a statistical Bayesian approach. We propose and compare two different state-based learning architectures,
namely HMMs and CHMMs, for modeling behaviors and interactions. The CHMM model is shown to work much more efficiently and
accurately.
Finally, to deal with the problem of limited training data, a synthetic ‘Alife-style’ training system is used to develop flexible
prior models for recognizing human interactions. We demonstrate the ability to use these a priori models to accurately classify
real human behaviors and interactions with no additional tuning or training.