View Related Documents

Abstract

In this paper, we present the storage management of the WHOWEDA web warehousing system, which warehouses historical web information. To facilitate inter-table and intra-table sharing of web pages, we propose a three-layer storage architecture, that consists of tuple, table, and pool layers of storage modules storing different parts of ware-housed web information. To improve retrieval efficiency, we have chosen to replicate some node attributes across web tables in the table layer while keeping only unique copies of web pages at the pool layer. The separation of table and pool layer storage also allows different valid times to be maintained by multiple web tables for the same web pages due to different schedules of global coupling across web tables. As the sharing of web pages may lead to valid time inconsistency between different web tables, we propose an update synchronization scheme to resolve the valid time differences on user request.
This work was supported in part by the Nanyang Technological University, Ministry of Education (Singapore) under Academic Research Fund #4-12034-5060, #4-12034-3012, #4-12034-6022. Any opinions, findings, and recommendations in this paper are those of the authors and do not reflect the views of the funding agencies.

Fulltext Preview

Image of the first page of the fulltext document