Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources


Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources

Sergey Pavlovich LEVASHKIN, Konstantin Nikolaevich IVANOV, Sergey Vladimirovich KUSHUKOV

Abstract

The original information system «data farm» is presented. Today, the successful application of artificial intelligence algorithms, primarily deep learning based on artificial neural networks, almost completely depends on the availability of data. And the larger the amount of these data (big data), the better are the results of the algorithms execution. There are well-known examples of such algorithms from Facebook, Google, Microsoft, Yandex, etc. The data must contain both the training sample and the test one. Moreover, the data must be of good quality and have a certain structure, ideally, be labeled in order for the learning algorithms to work adequately. This is a serious problem requiring huge computational and human resources. This paper is dedicated to solve this problem. Today data farm is a rather complex information system built on a modular basis, similar to the well-known Lego constructor. Separate modules of the system are various modern algorithms, technologies and entire libraries of artificial intelligence, and all together they are designed to automate the process of obtaining and structuring high-quality big data in various subject domains. The system has been tested on data of COVID-19 in regions of Russia and countries around the world. In addition, a user-friendly interface for visualizing collected and processed on the farm data was developed. This makes it possible to conduct visual numerical experiments of computer simulation and compare them with real data, turning the farm into an intelligent decision support information system.

Keywords

intelligent information system, data farm, big data, data processing, data visualization, computer modeling

Edition

Proceedings of the Institute for System Programming, vol. 35, issue 2, 2023, 57-72

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2023-35(2)-5

For citation

Sergey Pavlovich LEVASHKIN, Konstantin Nikolaevich IVANOV, Sergey Vladimirovich KUSHUKOV Data farm: Information system for collecting, storing and processing unstructured data from heterogeneous sources. Proceedings of the Institute for System Programming, vol. 35, issue 2, 2023, 57-72 DOI: 10.15514/ISPRAS-2023-35(2)-5.

Full text of the paper in pdf (in Russian) Back to the contents of the volume