We present a new model-based monaural speech separation technique for separating two speech signals from a single recording
of their mixture. This work is an attempt to solve a fundamental limitation in current model-based monaural speech separation
techniques in which it is assumed that the data used in the training and test phases of the separation model have the same
energy level. To overcome this limitation, a gain adapted minimum mean square error estimator is derived which estimates sources
under different signal-to-signal ratios. Specifically, the speakers’ gains are incorporated as unknown parameters into the
separation model and then the estimator is derived in terms of the source distributions and the signal-to-signal ratio. Experimental
results show that the proposed system improves the separation performance significantly when compared with a similar model
without gain adaptation as well as a maximum likelihood estimator with gain estimation.
Keywords Source separation - Model-based monaural speech separation - Minimum mean square error estimation - Gain adaptation - Mixmax approximation
A preliminary version of this paper was presented at the IEEE Workshop on Machine Learning for Signal Processing (MLSP) held
in Thessaloniki, Greece in August 2007.