View Related Documents

Abstract

In order to help users access on-line materials with more specific questions, we build a learning portal named Fusion. First we develop FusionCrawler, a link classification focused crawler, to download potential course pages. We then use a binary classifier to pick out the course pages. After the course pages are identified, we use FusionExtractor, a DOM tree based regular expression wrapper, to extract metadata. The metadata include Course Name, Instructor Information, Course Outline, and other relevant information, and they are stored in a database behind the portal. Experimental results show that our approach to organize on-line courses based on focused crawling and metadata extraction approach is effective. The FusionCrawler got average 40-50% more on-topic learning materials than normal focused crawler, while the average F1 in FusionExtractor is 85%. With metadata of more than 1,400 MIT OCW, 3000 UIUC and 1000 WISC courses; 300 courses from GreatLearning with 3000 Chinese course videos; and nearly 1000 videos from Internet Achieve; the Fusion portal provides several kinds of searching function, like quick search, advanced search and semantic navigation browsing.

Keywords  Focused Crawling - Metadata Extraction - Learning Object Manage-ment - Ontology

Fulltext Preview

Image of the first page of the fulltext document