Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Optimal Dimension Order: A Generic Technique for the Similarity Join
| |
|
Optimal Dimension Order: A Generic Technique for the Similarity Join
Christian Böhm7 , Florian Krebs8 and Hans-Peter Kriegel8 
| (7) |
University for Health Informatics and Technology, Innsbruck |
| (8) |
University of Munich, Munich |
Abstract
The similarity join is an important database primitive which has been successfully applied to speed up applications such as
similarity search, data analysis and data mining. The similarity join combines two point sets of a multidimensional vector
space such that the result contains all point pairs where the distance does not exceed a given Parameter ∈. Although the similarity
join is clearly CPU bound, most previous publications propose strategies that primarily improve the I/O performance. Only
little effort has been taken to address CPU aspects. In this Paper, we show that most of the computational overhead is dedicated
to the final distance computations between the feature vectors. Consequently, we propose a generic technique to reduce the
response time of a large number of basic algorithms for the similarity join. It is applicable for index based join algorithms
as well as for most join algorithms based on hashing or sorting. Our technique, called Optimal Dimension Order, is able to
avoid and accelerate distance calculations between feature vectors by a careful order of the dimensions. The order is determined
according to a probability model. In the experimental evaluation, we show that our technique yields high performance improvements
for various underlying similarity join algorithms such as the R-tree similarity join, the breadth- first-R-tree join, the
Multipage Index Join, and the ∈-Grid-Order.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|