Computer classification of Russian poetic texts by genres and styles

The material was received by the Editorial Board: 03.07.2017
Abstract
The automation of a complex analysis of poetic texts includes a task of identification of their generic and stylistic characteristics, which we consider as important attributes to determine the impact of low levels of verse (meter, rhythm, prosody, vocabulary, grammar) on the higher. To solve this particular task, the paper analyses the principles of formation of training sets which can be then used for refining automated algorithms proper to identify adequately the genres and styles of Russian poetic texts. To find out which algorithm of classification of poetic texts is the most accurate, we carried out a number of computational experiments with the corpus of lyceum poetry of A. S. Pushkin, including the method of assembling which involves the building of the compositions of algorithms, the advantage of this technique being that the errors of individual algorithms are mutually compensated. When assembling, we consider the algorithms in which the function referred to as algorithmic operator establishes a correspondence between a plurality of objects and a space of estimates, while another function, called the decision rule, establishes a correspondence between the space of estimates and the set of values of the objective function. As a result, the considered algorithms are given the form of a superposition of the algorithmic operator and the decision rule. Our computational experiments based on single words, bigrams and trigrams from the poems of the Pushkin's training set allowed to test the most well-known methods of assembling, such as weighted voting, boosting and stacking. The algorithms developed for the task showed their efficiency as far as adequate identification of the genres and styles of Russian poetic texts is concerned. Also, it was found that even with simple classifiers based on lexical features or on n-grams it is possible to obtain good results. We established that on the basis of the criterion of maximizing of the minimum precision a multilayer perceptron should be used. These computer algorithms can significantly simplify the work of experts investigating Russian poetic styles and genres.

Keywords: computer analysis of poetic texts, identification of genres and styles, classification algorithms
References: Barakhnin, V.B. Computer classification of Russian poetic texts by genres and styles. NSU Vestnik Journal, Series: Linguistics and Intercultural Communication. 15, 3. P. 13–23. DOI: 10.25205/1818-7935-2017-15-3-13-23