Using Trigrams for Automatic Speech Recognition

The material was received by the Editorial Board: 18.03.2020
Abstract
Among the existing theories of speech recognition, the most accepted one is viewing perception as an adaptive process in which the procedure of perception is subordinated to the means of signal identification and the purpose of the hearing. The listener is found to recognize a word in isolation or in context after listening to it to the end, and the moment of recognition of a word depends on a number of physical and linguistic characteristics. The moment of recognition of a word is identified as the recognition point fr om which the sequence of segments is associated with a particular word. The complexity of solving the problem of automatic speech recognition is explained by a great variability of acoustic signals, which is due to a number of reasons: different implementation of phonemes, position and characteristics of acoustic receivers, changes in speech parameters of the same speaker, differences between speakers. Word boundaries can be defined only in the process of recognition, by selecting the optimal word sequence that matches the input speech flow by acoustic, linguistic and pragmatic criteria best. Among the methods of implementing automatic speech recognition is the method of consolidation of coding units, which is understood as the relationship between the elements of a particular sequence of signals, which are then most strongly related. Recognition thresholds for non-meaningful sequences, which are supposed to be considered the most successful number of meaningless sequences consisting of three sounds – trigram -- are revealed. Based on a study on the difficulty of pronunciation of Russian trigrams, it is assumed that easily pronounced trigrams are recognized with a higher degree of probability than those that were difficult to pronounce. The use of trigrams during the decoding will increase the probability of speech recognition quality, because when divided into trigrams, a longer word is singled out as a syllable and is pronounced more vividly. It is pointed out, that many trigrams correspond to a typical Russian syllable that at the recognition of the speech divided into trigrams allows us to assume probability of forecasting the subsequent sounds. With such an approach it is possible to use Markov's latent models where the chain is represented as a graph whose knots are states, and arcs are possible transitions between states wh ere each transition has its own probability of origin.

Keywords: speech recognition, trigram, hidden Markov model, combination, understanding of speech
References: Butenko Iuliia Using Trigrams for Automatic Speech Recognition. Vestnik NSU. Series: Linguistics and Intercultural Communication. 2020. Vol. 18, 3.