We propose
nested words to capture models where there is
both a natural linear sequencing of positions and a hierarchically nested matching of positions. Such dual structure exists for
executions of structured programs where there is a natural well-nested correspondence among entries to and exits from program
components such as functions and procedures, and for XML documents where each open-tag is matched with a closing tag in a
well-nested manner.
We define and study finite-state automata as acceptors of nested words. A nested-word automaton is similar to a classical
finite-state word automaton, and reads the input from left to right according to the linear sequence. However, at a position
with two predecessors, one due to linear sequencing and one due to a hierarchical nesting edge, the next state depends on
states of the run at both these predecessors. The resulting class of regular languages of nested words has all the appealing theoretical properties that the class of classical regular word languages
enjoys: deterministic nested word automata are as expressive as their nondeterministic counterparts; the class is closed under
operations such as union, intersection, complementation, concatenation, and Kleene-*; decision problems such as membership,
emptiness, language inclusion, and language equivalence are all decidable; definability in monadic second order logic of nested
words corresponds exactly to finite-state recognizability; and finiteness of the congruence induced by a language of nested
words is a necessary and sufficient condition for regularity.