Volume 17, Number 4, 311-321, DOI: 10.1007/s11222-007-9021-3

Data skeletons: simultaneous estimation of multiple quantiles for massive streaming datasets with applications to density estimation

James P. McDermott, G. Jogesh Babu, John C. Liechty and Dennis K. J. Lin

View Related Documents

Abstract

We consider the problem of density estimation when the data is in the form of a continuous stream with no fixed length. In this setting, implementations of the usual methods of density estimation such as kernel density estimation are problematic. We propose a method of density estimation for massive datasets that is based upon taking the derivative of a smooth curve that has been fit through a set of quantile estimates. To achieve this, a low-storage, single-pass, sequential method is proposed for simultaneous estimation of multiple quantiles for massive datasets that form the basis of this method of density estimation. For comparison, we also consider a sequential kernel density estimator. The proposed methods are shown through simulation study to perform well and to have several distinct advantages over existing methods.

Keywords  Sequential quantile estimation - Sequential density estimation - Online algorithms - Sequential algorithms - Cubic spline

Fulltext Preview

Image of the first page of the fulltext document