View Related Documents

Abstract

With the explosive growth of the World Wide Web, it is becoming increasingly difficult for users to discover Web pages that are relevant to a topic. To address this problem we are developing a system that allows the collection and analysis of Web pages related to a particular topic. In this paper we present the systemrsquos overall architecture and introduce the focused crawler used by the system. We also discuss the various techniques we use to allow the user to analyze and gain useful insights about a collection. Finally, we present some statistics on the collections.

Keywords  Focused crawling - Graph algorithms - Hubs - Site graph analysis

Fulltext Preview

Image of the first page of the fulltext document