View Related Documents

Abstract

This paper is concerned with research on OCR (optical character recognition) of printed mathematical expressions. Construction of a representative corpus of technical and scientific documents containing expressions is discussed. A statistical investigation of the corpus is presented, and usefulness of this analysis is demonstrated in the related research problems, namely, (i) identification and segmentation of expression zones from the rest of the document, (ii) recognition of expression symbols, (iii) interpretation of expression structures, and (iv) performance evaluation of a mathematical expression recognition system. Moreover, a groundtruthing format has been proposed to facilitate automatic evaluation of expression recognition techniques.

Keywords:  OCR - Mathematical expressions - Database - Groundtruthing - Statistical learning - Performance evaluation

Received: 10 July 2003, Accepted: 22 November 2004, Published online: 18 March 2005
Correspondence to: Utpal Garain

Fulltext Preview

Image of the first page of the fulltext document