Welcome!
To use the personalized features of this site, please log in or register.
If you have forgotten your username or password, we can help.
My Menu
Saved Items

Knowledge Discovery from Semistructured Texts

Hiroshi SakamotoContact Information, Hiroki ArimuraContact Information and Setsuo ArikawaContact Information

(2)  Department of Informatics, Kyushu University, Hakozaki 6-10-1, Higashi-ku, 812-8581 Fukuoka-shi, Japan
Abstract
This paper surveys our recent results on the knowledge discovery from semistructured texts, which contain heterogeneous structures represented by labeled trees. The aim of our study is to extract useful information from documents on the Web. First, we present the theoretical results on learning rewriting rules between labeled trees. Second, we apply our method to the learning HTML trees in the framework of the wrapper induction. We also examine our algorithms for real world HTML documents and present the results.

Contact Information Hiroshi Sakamoto
Email: hiroshi@i.kyushu-u.ac.jp

Contact Information Hiroki Arimura
Email: arim@i.kyushu-u.ac.jp

Contact Information Setsuo Arikawa
Email: arikawa@i.kyushu-u.ac.jp
Fulltext Preview (Small, Large)
Image of the first page of the fulltext

References secured to subscribers.



Export this chapter
Export this chapter as RIS | Text
 
Remote Address: 38.107.191.108 • Server: mpweb04
HTTP User Agent: CCBot/1.0 (+http://www.commoncrawl.org/bot.html)