Lingvodoc: a virtual laboratory for documenting endangered languages

Download catalogue of technologies

Lingvodoc: a virtual laboratory for documenting endangered languages

Lingvodocis a system intended for collaborative multi-user documentation of endangered languages, for creating multi-layered dictionaries and performing scientific work with the received sound and text data. It is a joint project with the Institute of Linguistics of the Russian Academy of Sciences and Tomsk State University. Lingvodoc is under active development since 2012 and can be found on

Features and advantages

Lingvodoc is an open source cross-platform system based on an innovative research (,

Lingvodoc provides:

  • Collaborative work on dictionaries (as opposed to the similar Starling project that doesn’t support this feature).
  • Saving full history of user actions.
  • Working with audio-textual corpuses and dictionaries simultaneously based on the integration with the ELAN system developed by Max Planck Institute of Psycholinguistics (Netherlands).
  • Creating and editing unidirectional and bidirectional connections between lexical entries within dictionaries as well as external connections between dictionaries.
  • Recording, playing and storing sounds with markup (in WAV, MP3 and FLAC formats), as well asconstructing vowel formantsfollowed with data visualization.
  • Advanced search that supports multiple parameters (as opposed to the similar TypeCraft project).
  • Ability to search data on a map with automatic construction of isoglosses.
  • Conflict-free bilateral delayed synchronization.
  • High automation level (compared to the similar Kielipankki project): ability to carry out automatic etymological and phonetic analysis.
  • Creating dictionaries of any structure, such as typical two-layer dictionaries with lexical entry layer and paradigms layer or multi-layer dictionaries. Importing dictionary structures is also supported.
  • Algorithms mimicking scientists’ work for phonetic and etymological analysis.
  • Support for storing text corpora in the Word format and dictionaries in the Excel format.
  • Built-in morphological analysis for the languages of Russia in the Aperitum format.
  • A convenient interface for disambiguating homonyms after completing morphological analysis.
  • Either using the ISP RAS cloud infrastructure resources or locally deployed resources with data isolation.
  • Desktop and web-based versions.
  • Openregistration (confirmation required).
  • Fast development for extending the system features as well as easy adaptation to another scientific field.

Who is Lingvodoctarget audience?

Lingvodoc is designed primarily for linguistsperforming a research in the area of documenting the endangered languages in Russia. However, it is possible to adapt the technology for other purposes.

Lingvodoc deployment stories

Lingvodoc is currently used by philologists in 29 universities and scientific centers of 16 cities, including Tomsk State University, Institute of Philology (Siberian Branch of RAS). Institute of history, language and literature (Ufa scientific center of RAS), Udmurt Federal Research Center UB RAS, North-Eastern Federal University, Ugra State University, Institute of Linguistics, Literature and History (Karelian Research Centre of RAS), Murmansk Arctic State University.

Lingvodoc workflow

Lingvodoc: a virtual laboratory for documenting endangered languages

Back to the list of technologies of ISP RAS