Lecture Notes in Computer Science, 2007, Volume 4822/2007, 250-256, DOI: 10.1007/978-3-540-77094-7_34

Personal Name Disambiguation in Web Search Results Based on a Semi-supervised Clustering Approach

Kazunari Sugiyama and Manabu Okumura

View Related Documents

Abstract

Most of the previous works that disambiguate personal names in Web search results often employ agglomerative clustering approaches. In contrast, we have adopted a semi-supervised clustering approach in order to guide the clustering more appropriately. Our proposed semi-supervised clustering approach is novel in that it controls the fluctuation of the centroid of a cluster, and achieved a purity of 0.72 and inverse purity of 0.81, and their harmonic mean F was 0.76.

Keywords  Information retrieval - Semi-supervised clustering - Personal name disambiguation

Fulltext Preview

Image of the first page of the fulltext document