This paper proposes a new knowledge-based method for clustering metagenome short reads. The method incorporates biological
knowledge in the clustering process, by means of a list of proteins associated to each read. These proteins are chosen from
a reference proteome database according to their similarity with the given read, as evaluated by BLAST. We introduce a scoring
function for weighting the resulting proteins and use them for clustering reads. The resulting clustering algorithm performs
automatic selection of the number of clusters, and generates possibly overlapping clusters of reads. Experiments on real-life
benchmark datasets show the effectiveness of the method for reducing the size of a metagenome dataset while maintaining a
high accuracy of organism content.