This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified,
respectively, in the future crawls. The methods can make web database administrators avoid unnecessarily requesting undownloadable
and unmodified web pages in a page group. We postulate that the change behavior of web pages is strongly related to the past
change behavior. We gather the change histories of approximately three million web pages at two-day intervals for 100 days,
and estimate the future change behavior of those pages. Our estimation, which was evaluated by actual change behavior of the
pages, worked well.
Keywords web page change estimation - web database administration
This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD). KRF-2006-214-D00136.