In this paper, we present the storage management of the WHOWEDA web warehousing system, which warehouses historical web information.
To facilitate inter-table and intra-table sharing of web pages, we propose a three-layer storage architecture, that consists
of tuple, table, and pool layers of storage modules storing different parts of ware-housed web information. To improve retrieval
efficiency, we have chosen to replicate some node attributes across web tables in the table layer while keeping only unique
copies of web pages at the pool layer. The separation of table and pool layer storage also allows different valid times to
be maintained by multiple web tables for the same web pages due to different schedules of global coupling across web tables.
As the sharing of web pages may lead to valid time inconsistency between different web tables, we propose an update synchronization
scheme to resolve the valid time differences on user request.
This work was supported in part by the Nanyang Technological University, Ministry of Education (Singapore) under Academic
Research Fund #4-12034-5060, #4-12034-3012, #4-12034-6022. Any opinions, findings, and recommendations in this paper are those
of the authors and do not reflect the views of the funding agencies.