Linguistic Platforms Laboratory


Linguistic Platforms Laboratory



The LingvoDoc platform lingvodoc.ispras.ru, supervised by Ju.V. Normanskaja, principal researcher, PhD (Doctor of Sciences), was launched in 2013. It currently contains audio dictionaries and corpora in more than 450 endangered dialects of the Uralic and Altaic languages of Russia.

In addition to providing one with data storage space and database search engine, this platform offers a possibility for simultaneous distributed processing of one’s data and the software for their subsequent analysis such as, for instance, tools for determining phonetic similarity between languages online, identifying certain morphological parameters used in a specific meaning, or drawing synchronic and diachronic maps with phonetic, morphological, or lexical isoglosses.

The LingvoDoc platform enables one to store data from users belonging to various organizations while preserving all the rights of the creators of dictionaries and corpora and to work with data in a restricted access mode when only a limited number of users selected by the creator of the dictionary or corpus have access to them. Furthermore, each user of the LingvoDoc platform can compare the data from their dictionaries with the data on other dialects set to any parameters they see fit using original software developed by the laboratory staff. As the platform already stores data on 450 dialects of the Uralic and Altaic languages of Russia, which exceeds 2 million word forms in a single digital format, the comparative-historical, phonetic, and morphological analysis uses big data processing techniques, which significantly increases the accuracy of the results obtained.

A collaborative research project with other creators of the national corpora of the languages of Russia is currently underway, which aims at providing a special environment for using parsers online, word-sense disambiguation, and identifying collocations. It will result in developing software tools for corpus-based description of morphology. Based on a more accurate description of morphology, the next step will be launching digital learning platforms powered by Revita in cooperation with scholars from the University of Helsinki and the Saint Petersburg branch of the National Research University Higher School of Economics.

Since 2020, the laboratory offers a further education course "Digital Methods for Describing the Languages of the Peoples of Russia." This course will help its students to explore all the options provided by the LingvoDoc platform and to get individual feedback on how best to process their data using the platform.

The actual task of this project was posed by the Ministry of Science and Higher Education of the Russian Federation. On September 17, 2020 in Skolkovo, at the final meeting of the program "Leaders of scientific and technical breakthrough", minister V. N. Falkov was presented with the virtual laboratory "LingvoDoc" created in collaboration with the Institute for System Programming. For 8 years, the team of "LingvoDoc" has compiled audio dictionaries of 900 dialects of languages of Russia, which is 100 times higher than the productivity of ordinary research teams in the world. With the help of "LingvoDoc", the number of Q1-Q2 articles has increased almost 10 times compared to other research institutes. At the meeting in Skolkovo and at a personal meeting with K. A. Shved, it was recommended that the Institute for System Programming created similar virtual laboratories, including one for genetics. Our project implements these recommendations by bringing together geneticists from the majority of specialized organizations, linguists, cartographers and the Institute for System Programming.

Many Uralic dialects currently have neither descriptions of their phonetics or grammar nor dictionaries; on the other hand, existing outlines follow various standards and are difficult to access. The dialects themselves as well as the archive data describing them are now on the very brink of extinction. A few regional enthusiasts have tried to develop their own graphic systems for recording dialect texts in order to create dictionaries, primers, and textbooks. However, these systems usually differ from speaker to speaker, and thus the methods for recording dialects in the 21st century actually turn out to be less standardized than those of the creators of the first Slavic books at the beginning of the 11th–13th centuries. In some cases, it might even lead to conflicts in certain regions.

Through this project, the website lingvodoc.ispras.ru will present dictionary data on all the dialect groups of the Uralic languages, including both fresh field data on modern dialects and the earliest archive data found. Using special software, a subsequent phonetic and comparative-historical analysis will then show their genetic proximity based on comprehensive dictionary data on various dialects. The results of this analysis will be plotted on the map of the Russian Federation and presented in the form of genetic language trees.

Head of the Laboratory

Normanskaja Julia Viktorovna, Head of the Laboratory, principal researcher, Professor, PhD (Doctor of Sciences) in Philology. Project leader of the Russian Science Foundation grant no. 20-18-00403 "Digital Description of the Dialects of the Uralic Languages Using Big Data."

Laboratory Staff

Alpatov Vladimir Mihajlovich, principal researcher, academician of the Russian Academy of Sciences, Professor, PhD (Doctor of Sciences) in Philology (supported by the RSF grant)
Amelina Mariya Konstantinovna, junior researcher
Bazhenova Ol’ga Nikolaevna, junior researcher (supported by the RSF grant)
Bezenova Mariya Petrovna, senior researcher, PhD (Candidate of Sciences) in Philology
Vorob’eva Viktoriya Vladimirovna, researcher, PhD (Candidate of Sciences) in Philology
Gadzhieva Anar Akhmetbekovna, senior researcher, PhD (Candidate of Sciences) in Philology
Gajdamashko Roman Valentinovich, senior researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Kazakevich Ol’ga Anatol’evna, senior researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Kashkin Egor Vladimirovich, senior researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Klement’eva Elena Filippovna, researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Klyucheva Mariya Arkad’evna, researcher, researcher, PhD (Candidate of Sciences) in Art Criticism (supported by the RSF grant)
Kovylin Sergej Vasil’evich, senior researcher, PhD (Candidate of Sciences) in Philology
Koshelyuk Natal’ya Andreevna, junior researcher (supported by the RSF grant)
Levina Mariya Zakharovna, researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Li Polina Igorevna, laboratory assistant (supported by the RSF grant)
Mishchenkova Karina Olegovna, junior researcher
Moldanova Irina Maksimovna, junior researcher (supported by the RSF grant)
Napol’nova Elena Markovna, senior researcher, PhD (Candidate of Sciences) in Philology
Novak Irina Petrovna, researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Nurieva Fanuza Shakurovna, principal researcher, Professor, PhD (Doctor of Sciences) in Philology
Pustogacheva Oksana Nikolaevna, specialist
Ryabova Galina Viktorovna, researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Fedina Marina Serafimovna, senior researcher, PhD (Candidate of Sciences) in Philology (supported by the RSF grant)
Fedotova Idaliya Vyacheslavovna, junior researcher
Hozumova Raisa Pavlovna, laboratory assistant (supported by the RSF grant)