Institutional Login
Welcome!
To use the personalized features of this site, please
log in
or
register
.
If you have forgotten your username or password, we can
help
.
My Menu
Marked Items
Alerts
Order History
Saved Items
All
Favorites
Content Types
All
Publications
Journals
Book Series
Books
Reference Works
Protocols
Subject Collections
Architecture and Design
Behavioral Science
Biomedical and Life Sciences
Business and Economics
Chemistry and Materials Science
Computer Science
Earth and Environmental Science
Engineering
Humanities, Social Sciences and Law
Mathematics and Statistics
Medicine
Physics and Astronomy
Professional and Applied Computing
中文(简体)
中文(繁體)
English
Deutsch
한국어
日本語
Français
Español
العربية
Русский
Book Chapter
Grouping Web Pages about Persons and Organizations for Information Extraction
Book Series
Lecture Notes in Computer Science
Publisher
Springer Berlin / Heidelberg
ISSN
0302-9743 (Print) 1611-3349 (Online)
Volume
Volume 2555/2010
Book
Digital Libraries: People, Knowledge, and Technology
DOI
10.1007/3-540-36227-4
Copyright
2010
ISBN
978-3-540-00261-1
DOI
10.1007/3-540-36227-4_24
Pages
241-251
Subject Collection
Computer Science
SpringerLink Date
Tuesday, January 01, 2002
Add to marked items
Add to shopping cart
Add to saved items
Permissions & Reprints
Recommend this chapter
PDF (528.8 KB)
Free Preview
Grouping Web Pages about Persons and Organizations for Information Extraction
Shiren Ye
6
, Tat-seng Chua
6
, Jimin Liu
6
and Jeremy R. Kei
6
(6)
School of Computing, National University of Singapore, 117543, Singapore
Abstract
Information extraction on the Web permits users to retrieve specific information on a person or an organization. As names are non-unique, the same name may be mapped to multiple entities. The aim of this paper is to describe an algorithm to cluster Web pages returned by search engines so that pages belonging to different entities are clustered into different groups. The algorithm uses named entities as the features to divide the document set into direct and indirect pages. It then uses distinct direct pages as seeds of clusters to group indirect pages into different clusters. The algorithm has been found to be effective for Web-based applications.
Shiren
Ye
Email:
yesr@comp.nus.edu.sg
Tat-seng
Chua
Email:
chuats@comp.nus.edu.sg
Jimin
Liu
Email:
liujm@comp.nus.edu.sg
Jeremy
R.
Kei
Email:
jkei@comp.nus.edu.sg
Fulltext Preview (Small,
Large
)
References secured to subscribers.
more options
Find
Query Builder
Close
|
Clear
Title (ti)
Summary (su)
Author (au)
ISSN (issn)
ISBN (isbn)
DOI (doi)
And
Or
Not
(
)
* (wildcard)
"" (exact)
Within all content
Within this book series
Within this book
Export this chapter
Export this chapter as
RIS
|
Text
Frequently asked questions
|
General information on journals and books
|
Send us your feedback
|
Impressum
|
Contact
© Springer.
Part of Springer Science+Business Media
Privacy, Disclaimer, Terms and Conditions, © Copyright Information
MetaPress Privacy Policy
Remote Address: 38.107.191.107 • Server: mpweb16
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)