This paper presents the person identification system developed at Athens Information Technology and its performance in the
CLEAR 2007 evaluations. The system operates on the audiovisual information (speech and faces) collected over the duration
of gallery and probe videos. It comprises of an audio-only (speech), a video-only (face) and an audiovisual fusion subsystem.
Audio recognition is based on the Gaussian Mixture modeling of the principal components of composite feature vectors, consisting
of Mel-Frequency Cepstral Coefficients and Perceptual Linear Prediction coefficients of speech. Video recognition is based
on combining three different classification algorithms: Principal Components Analysis with a modified Mahalanobis distance,
sub-class Linear Discriminant Analysis (featuring automatic sub-class generation) with cosine distance and Bayesian classifier
based on Gaussian modeling of intrapersonal differences. A nearest neighbor classification rule is applied. A decision fusion
scheme across time and classifiers returns the video identity. The audiovisual subsystem fuses the unimodal identities into
the multimodal one, using a suitable confidence metric.