This paper shows how Wikipedia and the semantic knowledge it contains can be exploited for document clustering. We first create
a concept-based document representation by mapping the terms and phrases within documents to their corresponding articles
(or concepts) in Wikipedia. We also developed a similarity measure that evaluates the semantic relatedness between concept
sets for two documents. We test the concept-based representation and the similarity measure on two standard text document
datasets. Empirical results show that although further optimizations could be performed, our approach already improves upon
related techniques.