Partitioning a multi-dimensional data set into rectangular partitions subject to certain constraints is an important problem
that arises in many database applications, including histogram-based selectivity estimation, load-balancing, and construction
of index structures. While provably optimal and efficient algorithms exist for partitioning one-dimensional data, the multi-dimensional
problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that
are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we
present algorithmic and complexity-theoretic results for the fundamental problem of partitioning a two-dimensional array into
rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our
main results are approximation algorithms for several partitioning problems that provably approximate the optimal solutions
within small constant factors, and that run in linear or close to linear time. We also establish the NP-hardness of several
partitioning problems, therefore it is unlikely that there are efficient, i.e., polynomial time, algorithms for solving these
problems
exactly.
We also discuss a few applications in which partitioning problems arise. One of the applications is the problem of constructing
multi-dimensional histograms. Our results, for example, give an efficient algorithm to construct the V-Optimal histograms which are known to be the most accurate histograms in several selectivity estimation problems. Our algorithms
are the first to provide guaranteed bounds on the quality of the solution.
This work was done while the author was at Bell Labs.