Automatic document analysis tools for mathematical texts are necessary to enlarge the pool of mathematical knowledge available
in electronic form. However, development of such tools is currently hindered by the weakness of optical character recognition
systems in dealing with the large range of mathematical symbols and the often subtle but important distinctions in font usage
in mathematical texts. Research on developing better systems for mathematical optical character recognition crucially depends
on having an extensive, high quality database of glyphs used in mathematical texts for training and test purposes. We present
such a database of symbols constructed from a large set of characters available in the LATEX document preparation system that
can serve as a basis mathematical text recognition. We describe its integration into a prototypical system optical character
recognition system for mathematics that enables the construction of LATEX source documents from mathematical documents available
as images. From the lessons learned in this work we derive a road map for further research into the area of mathematical text
analysis.