Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Optimal Dimension Order: A Generic Technique for the Similarity Join

Christian BöhmContact Information, Florian KrebsContact Information and Hans-Peter KriegelContact Information

(7)  University for Health Informatics and Technology, Innsbruck
(8)  University of Munich, Munich
Abstract
The similarity join is an important database primitive which has been successfully applied to speed up applications such as similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector space such that the result contains all point pairs where the distance does not exceed a given Parameter ∈. Although the similarity join is clearly CPU bound, most previous publications propose strategies that primarily improve the I/O performance. Only little effort has been taken to address CPU aspects. In this Paper, we show that most of the computational overhead is dedicated to the final distance computations between the feature vectors. Consequently, we propose a generic technique to reduce the response time of a large number of basic algorithms for the similarity join. It is applicable for index based join algorithms as well as for most join algorithms based on hashing or sorting. Our technique, called Optimal Dimension Order, is able to avoid and accelerate distance calculations between feature vectors by a careful order of the dimensions. The order is determined according to a probability model. In the experimental evaluation, we show that our technique yields high performance improvements for various underlying similarity join algorithms such as the R-tree similarity join, the breadth- first-R-tree join, the Multipage Index Join, and the ∈-Grid-Order.

Contact Information Christian Böhm
Email: Christian.Boehm@umit.at

Contact Information Florian Krebs
Email: krebs@dbs.informatik.uni-muenchen.de

Contact Information Hans-Peter Kriegel
Email: kriegel@dbs.informatik.uni-muenchen.de
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.105 • Server: mpweb23
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)