Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Automatic Classification of Text Databases through Query Probing

Panagiotis G. IpeirotisContact Information, Luis GravanoContact Information and Mehran SahamiContact Information

(6)  Computer Science Department, Columbia University, 1214 Amsterdam Avenue, Mailcode: 0401, NY 10027-7003 New York, USA
(7)  E.piphany, Inc., 1900 South Norfolk Street, Suite 310, CA 94403 San Mateo, USA
Abstract
Many text databases on the web are “hidden” behind search interfaces, and their documents are only accessible through querying. Traditional search engines typically ignore the contents of such searchonly databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier’s rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.

Contact Information Panagiotis G. Ipeirotis
Email: pirot@cs.columbia.edu

Contact Information Luis Gravano
Email: gravano@cs.columbia.edu

Contact Information Mehran Sahami
Email: sahami@epiphany.com
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.82 • Server: mpweb06
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)