Biometric person authentication is a secure and user-friendly way of identifying persons in a variety of everyday applications.
In order to achieve high recognition rates, we propose an audio-visual person recognition system based on voice, lip motion
and still image. The combination of these three data sources (called sensor fusion) may be performed in several ways. We present
a method for a sensor normalization based on statistical properties which we call sensor calibration. The final fusion simplifies
to a multiplication or addition of the outputs of each sensor. This approach is evaluated on a large database of 170 people
with a total of 6315 recordings which were recorded in at least two sessions per person.