Information resources on the Internet, including Web pages and news articles, constitute a huge, ill-structured, and continuously
growing information space.
Knowledge discovery from the Internet is a challenge. It includes useful knowledge that is difficult to be automatically exploited by the following reasons
| – |
The Internet is full of junk pages and articles. Conventional techniques rarely tolerate so noisy information as often found
on the Internet
|
| – |
A single page or an article on the net is often too fine-grained as a unit of knowledge. We need techniques to extract a cluster
of inter-related finegrained pages and/or articles with semantic relationships among them
|