In order to help users access on-line materials with more specific questions, we build a learning portal named Fusion. First
we develop FusionCrawler, a link classification focused crawler, to download potential course pages. We then use a binary
classifier to pick out the course pages. After the course pages are identified, we use FusionExtractor, a DOM tree based regular
expression wrapper, to extract metadata. The metadata include Course Name, Instructor Information, Course Outline, and other
relevant information, and they are stored in a database behind the portal. Experimental results show that our approach to
organize on-line courses based on focused crawling and metadata extraction approach is effective. The FusionCrawler got average
40-50% more on-topic learning materials than normal focused crawler, while the average F1 in FusionExtractor is 85%. With
metadata of more than 1,400 MIT OCW, 3000 UIUC and 1000 WISC courses; 300 courses from GreatLearning with 3000 Chinese course
videos; and nearly 1000 videos from Internet Achieve; the Fusion portal provides several kinds of searching function, like
quick search, advanced search and semantic navigation browsing.
Keywords Focused Crawling - Metadata Extraction - Learning Object Manage-ment - Ontology