View Related Documents

Abstract

Although multi-lingual text processing is required for digital libraries, language education, and natural language processing, it has been difficult to realize for the confusion of a glyph with a character. This paper defines the relation between a glyph and a character, and it describes the processing of Mongolian scripts, an extended case of Perso-Arabic scripts handled by our system in ways which generalize to many complicated scripts. Mongolian scripts have particularly complicated orthographies and are almost impossible to encode. However, separating glyph defining information from encoding position solves some important problems arising from these and other scripts (including mixed languages) which may require multiple direction rendering. The study of many scripts led us to store, attached to the Wide Characters of POSIX, attributes which support not only the information for text manipulation (to be applied to a character) but glyph information as well such as variant and position necessary for display. Moreover, the information which is not available in a character code is provided from the database of our system to be embedded into a WC's attribute.
An arrow added to each script name shows the direction to which it is written. Paspa script was invented by Paspa, a Tibetan, in the age of the Yuan Dynasty, to be intended as the International Phonetic Alphabet at that time. This script is not discussed here, but we only add that it belongs to the Devanagari Script group. Manchu, the official literal language in the Ching Dynasty, and its descendent Sibo have their base on Mongolian script: Mongolian script family.
When introduced for Mongolian people in 13c., the classic Uigur script, which itself was borrowed from the Sogdians in 8c, turned 90 degrees to the left and was written vertically from the top. For the present Uigur script, see below.

Fulltext Preview

Image of the first page of the fulltext document