Research on the impact of network degradation on automatic speech recognition models


Research on the impact of network degradation on automatic speech recognition models

Polevoi A.V. (MSU, Moscow, Russia)
Loukachevitch N.V. (MSU, Moscow, Russial; SRCC MSU, Moscow, Russia)

Abstract

Despite the success of automatic speech recognition models on various datasets and in different languages, their use in everyday life is still limited due to their inability to handle certain scenarios, such as calls with unstable network connections or telephone channels with interference. In this paper, we present a specialized benchmark for Russian-language speech, which includes a representative dataset that simulates the effects of an unstable internet connection on speech. This benchmark was designed to test and compare the performance of modern speech recognition approaches. To quantify the level of degradation, we used an automated method based on analyzing a set of acoustic features and neural network metrics. The results obtained from our benchmark allow us to identify methods that are more resistant to acoustic distortion. These methods can be used to improve the reliability of speech recognition systems in real-world scenarios with challenging conditions.

Keywords

automatic speech recognition; WER estimation; audio benchmark of speech recordings; audio recordings with unstable internet connection.

Edition

Proceedings of the Institute for System Programming, vol. 38, issue 3, part 4, 2026, pp. 101-118

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2026-38(3)-49

For citation

Polevoi A.V., Loukachevitch N.V. Research on the impact of network degradation on automatic speech recognition models. Proceedings of the Institute for System Programming, vol. 38, issue 3, part 4, 2026, pp. 101-118 DOI: 10.15514/ISPRAS-2026-38(3)-49.

Full text of the paper in pdf (in Russian) Back to the contents of the volume