Semistructured data is specified by the lack of any fixed and rigid schema, even though typically some implicit structure
appears in the data. The huge amounts of on-line applications make it important and imperative to mine schema of semistructured
data, both for the users (e.g., to gather useful information and facilitate querying) and for the systems (e.g., to optimize
access). The critical problem is to discover the implicit structure in the semistructured data. Current methods in extracting
Web data structure are either in a general way independent of application background [8], [9], or bound in some concrete environment such as HTML etc [13], [14], [15]. But both face the burden of expensive cost and difficulty in keeping along with the frequent and complicated variances
of Web data. In this paper, we first deal with the problem of incremental mining of schema for semistructured data after the
update of the raw data. An algorithm for incrementally mining schema of semistructured data is provided, and some experimental
results are also given, which shows that our incremental mining for semistructured data is more efficient than non-incremental
mining.
Keywords Data Mining - Incremental Mining - Semistructured Data - Schema - Algorithm
This work was supported by the National Natural Science Foundation of China and the National Doctoral Subject Foundation of
China.