Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Grouping Web Pages about Persons and Organizations for Information Extraction

Shiren YeContact Information, Tat-seng ChuaContact Information, Jimin LiuContact Information and Jeremy R. KeiContact Information

(6)  School of Computing, National University of Singapore, 117543, Singapore
Abstract
Information extraction on the Web permits users to retrieve specific information on a person or an organization. As names are non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster Web pages returned by search engines so that pages belonging to different entities are clustered into different groups. The algorithm uses named entities as the features to divide the document set into direct and indirect pages. It then uses distinct direct pages as seeds of clusters to group indirect pages into different clusters. The algorithm has been found to be effective for Web-based applications.

Contact Information Shiren Ye
Email: yesr@comp.nus.edu.sg

Contact Information Tat-seng Chua
Email: chuats@comp.nus.edu.sg

Contact Information Jimin Liu
Email: liujm@comp.nus.edu.sg

Contact Information Jeremy R. Kei
Email: jkei@comp.nus.edu.sg
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.107 • Server: mpweb16
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)