Volume 24, Number 4, 605-622, DOI: 10.1007/s00180-009-0148-x

Reliability and efficiency of algorithms for computing the significance of the Mann–Whitney test

Niranjan Nagarajan and Uri Keich

View Related Documents

Abstract

Motivated by recent applications of the Mann–Whitney U test to large data sets we took a critical look at current methods for computing its significance. Surprisingly, we found that the two fastest and most popular tools for exact computation of the test significance, Dinneen and Blakesley’s and Harding’s, can exhibit large numerical errors even in moderately large datasets. In addition, another method proposed by Pagano and Tritchler also suffers from a similar numerical instability and can produce inaccurate results. This motivated our development of a new algorithm, mw-sFFT, for the exact computation of the Mann–Whitney test with no ties. Among the class of exact algorithms that are numerically stable, mw-sFFT has the best complexity: O(m 2 n) versus O(m 2 n 2) for others, where m and n are the two sample sizes. This asymptotic efficiency is also reflected in the practical runtime of the algorithm. In addition, we also present a rigorous analysis of the propagation of numerical errors in mw-sFFT to derive an error guarantee for the values computed by the algorithm. The reliability and efficiency of mw-sFFT make it a valuable tool in compuational applications and we plan to provide open-source libraries for it in C++ and Matlab.

Keywords  Numerical error - Exact computation - FFT

Fulltext Preview

Image of the first page of the fulltext document