Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
|
 |
Post-supervised Template Induction for Dynamic Web Sources
| |
|
Post-supervised Template Induction for Dynamic Web Sources
Zhongmin Shi5, Evangelos Milios5 and Nur Zincir-Heywood5 
| (5) |
Faculty of Computer Science, Dalhousie University, Halifax, N.S., Canada, B3H 1W5 |
Abstract
Dynamic web sites commonly return information in the form of lists and tables. Although hand crafting an extraction program
for a specific template is time-consuming but straightforward, it is desirable to automatically generate template extraction
programs from examples of lists and tables in html documents. We describe a novel technique, Post-supervised Learning, which
exploits unsupervised learning to avoid the need for training examples, while minimally involving the user to achieve high
accuracy. We have developed unsupervised algorithms to extract the number of rows and adopted a dynamic programming algorithm
for extracting columns. Our system, called TIDE (Template Induction for web Data Extraction), achieves high performance with
minimal user input compared to fully supervised techniques.
Fulltext Preview (Small, Large)
 References secured to subscribers.
|
|
|
|
|
|