The Human Speechome Project is an effort to observe and computationally model the longitudinal course of language development
for a single child at an unprecedented scale. We are collecting audio and video recordings for the first three years of one
child’s life, in its near entirety, as it unfolds in the child’s home. A network of ceiling-mounted video cameras and microphones
are generating approximately 300 gigabytes of observational data each day from the home. One of the worlds largest single-volume
disk arrays is under construction to house approximately 400,000 hours of audio and video recordings that will accumulate
over the three year study. To analyze the massive data set, we are developing new data mining technologies to help human analysts
rapidly annotate and transcribe recordings using semi-automatic methods, and to detect and visualize salient patterns of behavior
and interaction. To make sense of large-scale patterns that span across months or even years of observations, we are developing
computational models of language acquisition that are able to learn from the childs experiential record. By creating and evaluating
machine learning systems that step into the shoes of the child and sequentially process long stretches of perceptual experience,
we will investigate possible language learning strategies used by children with an emphasis on early word learning.