Lecture Notes in Computer Science, 2001, Volume 2166/2001, 418-426, DOI: 10.1007/3-540-44805-5_56

Creation of a Corpus of Training Sentences Based on Automated Dialogue Analysis

Jana Schwarz and Václav Matoušek

View Related Documents

Abstract

The development of computerized information retrieval dialogue systems communicating with the user in natural language requires the implementation of an effective training procedure with the aid of which the main modules of the dialogue system can be partly automatically developed. The presented paper describes an attempt to create the sentence templates automatically, using a special program package implementing an especially developed method of a quantitative linguistic analysis of transcribed real dialogues. Firstly, the program package generates a set of formulas (templates) consisting of elements of a special grammar and describing the syntactic structure of required sentences. Secondly, it generates a large corpus of unique training sentences using the sentence templates and a stochastic context-free grammar. The experimentally created corpus was used for the training of modules of a city information dialogue system.

Fulltext Preview

Image of the first page of the fulltext document