Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Ontologies, Databases and Applications of Semantics (ODBASE) 2006 International Conference
Similarity and Matching

Finding Similar Objects Using a Taxonomy: A Pragmatic Approach

Peter SchwarzContact Information, Yu DengContact Information and Julia E. RiceContact Information

(1)  IBM Almaden Research Center, San Jose, CA 95120,  
(2)  IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598,  
Abstract
Several authors have suggested similarity measures for objects labeled with terms from a hierarchical taxonomy. We generalize this idea with a definition of information-theoretic similarity for taxonomies that are structured as directed acyclic graphs from which multiple terms may be used to describe an object. We discuss how our definition should be adapted in the presence of ambiguity, and introduce new similarity measures based on our definitions.
We present an implementation of our measures that is integrated with a relational database and scales to large taxonomies and datasets. We evaluate our measures by applying them to an object-matching problem from bioinformatics, and show that, for this task, our new measures outperform those reported in the literature. We also verified the scalability of our approach by applying it to patent similarity search, using patents classified with terms from the taxonomy defined by the United States Patent and Trademark Office.
Keywords: Semantic similarity measures, Object matching, Taxonomy, Information theoretic similarity.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11914853_71.

Contact Information Peter Schwarz
Email: schwarz@almaden.ibm.com

Contact Information Yu Deng
Email: dengy@us.ibm.com

Contact Information Julia E. Rice
Email: julia@almaden.ibm.com
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.112 • Server: mpweb20
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)