A chronological survey of the development of machine recognition of speech is contrasted with the beginnings of speech synthesis, and the advantages and disadvantages of the different systems and approaches as well as their changing degrees of dependency on phonetic knowledge are sketched. The unsolved fundamental problem of concatenation quality in present-day synthesis is discussed and a knowledge based solution mooted which can be projected onto recognition: A mathematical model of the relationship between temporally overlapping underlying articulatory gestures and the resulting surface acoustic signal.
Keywords speech synthesis - speech recognition - concatenation - articulatory gestures