Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Declustering Web Content Indices for Parallel Information Retrieval
| |
|
Declustering Web Content Indices for Parallel Information Retrieval
Yoojin Chung5 , Hyuk-Chul Kwon6 , Sang-Hwa Chung6 and Kwang Ryel Ryu6 
| (5) |
Research Institute of Computer, Information & Communication, Pusan National University, Pusan, 609-735, South Korea |
| (6) |
School of Electrical and Computer Engineering, Pusan National University, Pusan, 609-735, South Korea |
Abstract
We consider an information retrieval (IR) system on a low-cost highperformance PC cluster environment. The IR system replicates
the Web pages locally, it is indexed by the inverted-index file (IIF), and the vector space model is used as ranking strategy.
In the IR system, the inverted-index file (IIF) is partitioned into pieces using the lexical and the greedy declustering methods.
The lexical method assigns each of the terms in the IIF lexicographically to each of the processing nodes in turn and the
greedy one is based on the probability of co-occurrence of an arbitrary pair of terms in the IIF and distributed to the cluster
nodes to be stored on each node’s hard disk. For each incoming user’s query with multiple terms, terms are sent to the corresponding
nodes that contain the relevant pieces of the IIF to be evaluated in parallel. We study how query performance is affected
by two declustering methods with various-sized IIF. According to the experiments, the greedy method shows about 3.7% enhancement
overall when compared with the lexical method.
1 This paper was supported in part by the Korea Science and Engineering Foundation under contact NO. 2000-2-30300-002-3.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|