A machine learning technique called Graph-Based Induction (GBI) extracts typical patterns from graph data by stepwise pair
expansion (pairwise chunking). Because of its greedy search strategy, it is very efficient but suffers from incompleteness
of search. We improved its search capability without imposing much computational complexity by incorporating the idea of beam
search. Additional improvement is made to extract patterns that are more discriminative than those simply occurring frequently,
and to enumerate identical patterns accurately based on the notion of canonical labeling. This new algorithm was implemented
(now called Beam-wise GBI, B-GBI for short) and tested against a DNA data set from UCI repository. Since DNA data is a sequence
of symbols, representing each sequence by attribute-value pairs by simply assigning these symbols to the values of ordered
attributes does not make sense. By transforming the sequence into a graph structure and running B-GBI it is possible to extract
discriminative substructures. These can be new attributes for a classification problem. Effect of beam width on the number
of discovered attributes and predictive accuracy was evaluated, together with extracted characteristic subsequences, and the
results indicate the effectiveness of B-GBI.