Toward designing a system which teaches various works interactively and visually, this paper proposes a method of analyzing
instruction utterances. One of the biggest problem in dealing with spoken language is ellipsis/anaphor resolution. We resolve
it using a domain-specific case frame dictionary constructed automatically from a large amount of texts. Then, we attach utterance-type
to distinguish actions from notes, tips, etc. Based on the attached type, we analyze discourse structure of utterances and
detect a unit of actions.