This paper presents different pre-processing techniques, coupled with three speaker diarization systems in the framework of
the NIST 2005 Spring Rich Transcription campaign (RT’05S).
The pre-processing techniques aim at providing a signal quality index in order to build a unique “virtual” signal obtained
from all the microphone recordings available for a meeting. This unique virtual signal relies on a weighted sum of the different
microphone signals while the signal quality index is given according to a signal to noise ratio.
Two methods are used in this paper to compute the instantaneous signal to noise ratio: a speech activity detection based approach
and a noise spectrum estimate. The speaker diarization task is performed using systems developed by different labs: the LIA,
LIUM and CLIPS. Among the different system submissions made by these three labs, the best system obtained 24.5 % speaker diarization
error for the conference subdomain and 18.4 % for the lecture subdomain.