Motivated by recent applications of the Mann–Whitney
U test to large data sets we took a critical look at current methods for computing its significance. Surprisingly, we found
that the two fastest and most popular tools for exact computation of the test significance, Dinneen and Blakesley’s and Harding’s,
can exhibit large numerical errors even in moderately large datasets. In addition, another method proposed by Pagano and Tritchler
also suffers from a similar numerical instability and can produce inaccurate results. This motivated our development of a
new algorithm, mw-sFFT, for the exact computation of the Mann–Whitney test with no ties. Among the class of exact algorithms
that are numerically stable, mw-sFFT has the best complexity:
O(
m
2
n) versus
O(
m
2
n
2) for others, where
m and
n are the two sample sizes. This asymptotic efficiency is also reflected in the practical runtime of the algorithm. In addition,
we also present a rigorous analysis of the propagation of numerical errors in mw-sFFT to derive an error guarantee for the
values computed by the algorithm. The reliability and efficiency of mw-sFFT make it a valuable tool in compuational applications
and we plan to provide open-source libraries for it in C++ and Matlab.
Keywords Numerical error - Exact computation - FFT