Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Automatic Classification of Text Databases through Query Probing
| |
|
Automatic Classification of Text Databases through Query Probing
Panagiotis G. Ipeirotis6 , Luis Gravano6 and Mehran Sahami7 
| (6) |
Computer Science Department, Columbia University, 1214 Amsterdam Avenue, Mailcode: 0401, NY 10027-7003 New York, USA |
| (7) |
E.piphany, Inc., 1900 South Norfolk Street, Suite 310, CA 94403 San Mateo, USA |
Abstract
Many text databases on the web are “hidden” behind search interfaces, and their documents are only accessible through querying.
Traditional search engines typically ignore the contents of such searchonly databases. Recently, Yahoo-like directories have
started to manually organize these databases into categories that users can browse to find these valuable resources. We propose
a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based
document classifier, and then uses the classifier’s rules to generate probing queries. The queries are sent to the text databases,
which are then classified based on the number of matches that they produce for each query. We report some initial exploratory
experiments that show that our approach is promising to automatically characterize the contents of text databases accessible
on the web.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|