This paper presents an approach for assessing cluster validity based on similarity knowledge extracted from the Gene Ontology
(GO) and databases annotated to the GO. A knowledge-driven cluster validity assessment system for microarray data was implemented.
Different methods were applied to measure similarity between yeast genes products based on the GO. This research proposes
two methods for calculating cluster validity indices using GO-driven similarity. The first approach processes overall similarity
values, which are calculated by taking into account the combined annotations originating from the three GO hierarchies. The
second approach is based on the calculation of GO hierarchy-independent similarity values, which originate from each of these
hierarchies. A traditional node-counting method and an information content technique have been implemented to measure knowledge-based
similarity between genes products (biological distances). The results contribute to the evaluation of clustering outcomes
and the identification of optimal cluster partitions, which may represent an effective tool to support biomedical knowledge
discovery in gene expression data analysis.