Communications in Computer and Information Science, 2009, Volume 6, Part 1, Part 14, 535-542, DOI: 10.1007/978-3-540-89985-3_66

Performance Modeling of a Distributed Web Crawler Using Stochastic Activity Networks

Mitra Nasri, Saeed Shariati and Mohammad Abdollahi Azgomi

View Related Documents

Abstract

One of the basic requirements of Web mining is a crawler system, which collects the information from the Web. To predict the performance, dependability and other operational measures of a system, it is required to construct and evaluate a formal model of the system. We have constructed a formal model for a distributed crawler, which is based on UbiCrawler, using stochastic activity networks (SANs). The constructed SAN model is used to evaluate some performance measures of the crawler. The results of the evaluation of throughput are same as the published statistics of UbiCrawler. In addition, we have been able to evaluate two other measures that are communication overhead and coverage. In this paper, we will discuss the architecture of the distributed crawler. Then, we will present a SAN model of the crawler and the results of its evaluation.

Keywords  Web crawler - performance modeling - stochastic activity networks

Fulltext Preview

Image of the first page of the fulltext document