Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Extracting Information from the Web for Concept Learning and Collaborative Filtering
(Extended Abstract)
| |
|
Extracting Information from the Web for Concept Learning and Collaborative Filtering
(Extended Abstract)
William W. Cohen4
| (4) |
WhizBang! Labs - Research, 4616 Henry Street, 15213 Pittsburgh, PA |
Abstract
Previous work on extracting information from the web generally makes few assumptions about how the extracted information will
be used. As a consequence, the goal of web-based extraction systems is usually taken to be the creation of high-quality, noise-free
data with clear semantics. This is a difficult problem which cannot be completely automated. Here we consider instead the
problem of extracting web data for certain machine learning systems: specifically, collaborative filtering (CF) and concept
learning (CL) systems. CF and CL systems are highly tolerant of noisy input, and hence much simpler extraction systems can
be used in this context. For CL, we will describe a simple method that uses a given set of web pages to construct new features,
which reduce the error rate of learned classifiers in a wide variety of situations. For CF, we will describe a simple method
that automatically collects useful information from the web without any human intervention. The collected information, represented
as “pseudo-users”, can be used to “jumpstart” a CF system when the user base is small (or even absent).
The work described here was conducted while the author was employed by AT&T Labs - Research.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|