View Related Documents

Abstract

This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can make web database administrators avoid unnecessarily requesting undownloadable and unmodified web pages in a page group. We postulate that the change behavior of web pages is strongly related to the past change behavior. We gather the change histories of approximately three million web pages at two-day intervals for 100 days, and estimate the future change behavior of those pages. Our estimation, which was evaluated by actual change behavior of the pages, worked well.

Keywords  web page change estimation - web database administration

This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD). KRF-2006-214-D00136.

Fulltext Preview

Image of the first page of the fulltext document