The minimum test collection problem is defined as follows. Given a ground set
$
\mathcal{S}
$
\mathcal{S}
and a collection
$
\mathcal{C}
$
\mathcal{C}
of tests (subsets of
$
\mathcal{S}
$
\mathcal{S}
), find the minimum subcollection
$
\mathcal{C}'
$
\mathcal{C}'
of
$
\mathcal{C}
$
\mathcal{C}
such that for every pair of elements (
x, y) in
$
\mathcal{S}
$
\mathcal{S}
there exists a test in
$
\mathcal{C}'
$
\mathcal{C}'
that contains exactly one of
x and
y. It is well known that the greedy algorithm gives a 1 + 2lnn approximation for the test collection problem where
$
n = \left| \mathcal{S} \right|
$
n = \left| \mathcal{S} \right|
, the size of the ground set. In this paper, we show that this algorithm is close to the best possible, namely that there
is no
o(log
n)-approximation algorithm for the test collection problem unless
P = NP.
We give approximation algorithms for this problem in the case when all the tests have a small cardinality, significantly improving
the performance guarantee achievable by the greedy algorithm. In particular, for instances with test sizes at most
k we derive an
O(log
k) approximation. We show APX-hardness of the version with test sizes at most two, and present an approximation algorithm with
ratio
$
\tfrac{7}
{6} + \varepsilon
$
\tfrac{7}
{6} + \varepsilon
for any fixed ε > 0.
Supported by a Merck Computational Biology and Chemistry Program Graduate Fellowship from the Merck Company Foundation.
Supported in part by subcontract No. 16082-RFP-00-2C in the area of “Combinatorial Optimization in Biology (XAXE),” Los Alamos
National Laboratories.