In many applications, modelling techniques are necessary which take into account the inherent variability of given data. In
this paper, we present an approach to model class specific pattern variation based on tangent distance within a statistical
framework for classification. The model is an effective means to explicitly incorporate invariance with respect to transformations
that do not change class-membership like e.g. small affine transformations in the case of image objects. If no prior knowledge
about the type of variability is available, it is desirable to learn the model parameters from the data. The probabilistic
interpretation presented here allows us to view learning of the variational derivatives in terms of a maximum likelihood estimation
problem. We present experimental results from two different real-world pattern recognition tasks, namely image object recognition
and automatic speech recognition. On the US Postal Service handwritten digit recognition task, learning of variability achieves
results well comparable to those obtained using specific domain knowledge. On the SieTill corpus for continuously spoken telephone
line recorded German digit strings the method shows a significant improvement in comparison with a common mixture density
approach using a comparable amount of parameters. The probabilistic model is well-suited to be used in the field of statistical
pattern recognition and can be extended to other domains like cluster analysis.