A graph-based approach to document classification is described in this paper. The graph representation offers the advantage
that it allows for a much more expressive document encoding than the more standard bag of words/phrases approach, and consequently
gives an improved classification accuracy. Document sets are represented as graph sets to which a weighted graph mining algorithm
is applied to extract frequent subgraphs, which are then further processed to produce feature vectors (one per document) for
classification. Weighted subgraph mining is used to ensure classification effectiveness and computational efficiency; only
the most significant subgraphs are extracted. The approach is validated and evaluated using several popular classification
algorithms together with a real world textual data set. The results demonstrate that the approach can outperform existing
text classification algorithms on some dataset. When the size of dataset increased, further processing on extracted frequent
features is essential.