Lecture Notes in Computer Science, 2005, Volume 3611/2005, 432, DOI: 10.1007/11539117_98

Characterization of Evaluation Metrics in Topical Web Crawling Based on Genetic Algorithm

Tao Peng, Wanli Zuo and Yilin Liu

View Related Documents

Abstract

Topical crawlers are becoming important tools to support applications such as specialized Web portals, online searching, and competitive intelligence. A topic driven crawler chooses the best URLs to pursue during web crawling. It is difficult to evaluate what URLs downloaded are the best. This paper presents some important metrics and an evaluation function for ranking URLs about pages relevance. We also discuss an approach to evaluate the function based on GA. GA evolving process can discover the best combination of the metrics’ weights. Avoiding misleading the result by a single topic, this paper presents a method which characterization of the metrics’ combination be extracted by mining frequent patterns. Extracting features adopts a novel FP-tree structure and FP-growth mining method based on FP-tree without candidate generation. The experiment shows that the performance is exciting, especially about a popular topic.

Fulltext Preview

Image of the first page of the fulltext document