There has been a surge of interest in the last several years in methods for automatic generation of content indices for multimedia
documents, particularly with respect to video and audio documents. As a result, there is much interest in methods for analyzing
transcribed documents from audio and video broadcasts and telephone conversations and messages. The present paper deals with
such an analysis by presenting a clustering technique to partition a set of transcribed documents into different meaningful
topics. Our method determines the intersection between matching transcripts, evaluates the information contribution by each
transcript, assesses the information closeness of overlapping words and calculates similarity based on Chi-square method.
The main novelty of our method lies in the proposed similarity measure that is designed to withstand the imperfections of
transcribed documents. Experimental results using documents of varying quality of transcription are presented to demonstrate
the efficacy of the proposed methodology.