Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Some Advances in Data-Mining Techniques
| |
|
Some Advances in Data-Mining Techniques
Jeffrey D. Ullman6 
| (6) |
Department of Computer Science, Stanford University, Stanford, CA 94305, USA |
Abstract
Research in the MIDAS project at Stanford explores new ideas in data-mining. One early result was a new algorithm for Web
search, that resulted in a recently turned commercial search engine, called Google.
A second area of interest is in generalizing the techniques such as “a-priori,” which were developed by Rakesh Agrawal and
his associates at IBM Research in Almaden to allow “market-basket analysis,” or “association-rule mining.” The latter problem
deals with finding items that customers frequently buy together. We have developed a framework called “query flocks.” In this
system, we can phrase highly complex data-mining queries, including many that are not handled well by commercial SQL systems.We
then compile the “query flock” into a sequence of SQL queries that are simple enough to be optimized by commercial systems.
A third interesting challenge is summarizing the knowledge of the Web in a form that resembles conven- tional relational data.
We describe some experiments that have been carried out to exploit the redundancy of the Web and discover the patterns in
which facts of a certain kind tend to exist.
Finally, we shall talk about extending the techniques for association-rule mining to extract relationships that are not based
on “high support,” i.e., sets of items that appear very frequently in market baskets. Important example include intelligence-gathering,
where we want to find terms that are highly correlated in documents, but that do not appear in very many documents. The MIDAS
group has recently developed some techniques to process very large amounts of data and detect efficiently items that are highly
correlated but not very frequent. We can even find implications, similar to causal relationships, without requiring high support
for the associated items.
Fulltext Preview (Small, Large)
|
|
|
|
|
|