Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Some Advances in Data-Mining Techniques

Jeffrey D. UllmanContact Information

(6)  Department of Computer Science, Stanford University, Stanford, CA 94305, USA
Abstract
Research in the MIDAS project at Stanford explores new ideas in data-mining. One early result was a new algorithm for Web search, that resulted in a recently turned commercial search engine, called Google.
A second area of interest is in generalizing the techniques such as “a-priori,” which were developed by Rakesh Agrawal and his associates at IBM Research in Almaden to allow “market-basket analysis,” or “association-rule mining.” The latter problem deals with finding items that customers frequently buy together. We have developed a framework called “query flocks.” In this system, we can phrase highly complex data-mining queries, including many that are not handled well by commercial SQL systems.We then compile the “query flock” into a sequence of SQL queries that are simple enough to be optimized by commercial systems.
A third interesting challenge is summarizing the knowledge of the Web in a form that resembles conven- tional relational data. We describe some experiments that have been carried out to exploit the redundancy of the Web and discover the patterns in which facts of a certain kind tend to exist.
Finally, we shall talk about extending the techniques for association-rule mining to extract relationships that are not based on “high support,” i.e., sets of items that appear very frequently in market baskets. Important example include intelligence-gathering, where we want to find terms that are highly correlated in documents, but that do not appear in very many documents. The MIDAS group has recently developed some techniques to process very large amounts of data and detect efficiently items that are highly correlated but not very frequent. We can even find implications, similar to causal relationships, without requiring high support for the associated items.

Contact Information Jeffrey D. Ullman
Email: ullman@cs.stanford.edu
Fulltext Preview (Small, Large)
Image of the first page of the fulltext


Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.107 • Server: mpweb19
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)