Volume 10, Number 4, 332-343, DOI: 10.1007/s00530-004-0160-5

Unsupervised speaker segmentation and tracking in real-time audio content analysis

Lie Lu and Hong-Jiang Zhang

View Related Documents

Abstract

This paper addresses the problem of real-time speaker segmentation and speaker tracking in audio content analysis in which no prior knowledge of the number of speakers and the identities of speakers is available. Speaker segmentation is to detect the speaker change boundaries in a speech stream. It is performed by a two-step algorithm, which includes potential change detection and refinement. Speaker tracking is then performed based on the results of speaker segmentation by identifying the speaker of each segment. In our approach, incremental speaker model updating and segmental clustering is proposed, which makes the unsupervised speaker segmentation and tracking feasible in real-time processing. A Bayesian fusion method is also proposed to fuse multiple audio features to obtain a more reliable result, and different noise levels are utilized to compensate for background mismatch. Experiments show that the proposed algorithm can recall 89% of speaker change boundaries with 15% false alarms, and 76% of speakers can be unsupervised identified with 20% false alarms. Compared with previous works, the algorithm also has low computation complexity and can perform in 15% of real time with a very limited delay in analysis.

Keywords:  Audio content analysis - Audio indexing - Speaker segmentation - Speaker change detection - Speaker tracking

Published online: 12 January 2005
Part of the work presented in this paper was published in the 10th ACM International Conference on Multimedia, 1-6 December 2002

Fulltext Preview

Image of the first page of the fulltext document