Distributed Data Storage Systems: Analysis, Classification and Choice
There are a large number of distributed data storage systems, and the vendors have different definitions of what is their solution: cloud storage, distributed file system, or a cluster file system, etc. This imposes difficulty in the selection of the distributed storage system, because it is not clear what indicators you should pay attention in the first place.
This paper proposes an analysis of various distributed data storage systems and possible solutions to basic problems of the subject area, in particular, the issue of system scaling, data consistency, availability and partition tolerance.
In this work we have ranked distributed storage systems based on various characteristics and have chosen the top of them for a further analysis. As the result of the analysis key system development patterns and trends were identified. These trends were further studied for correlations with systems functional and non-functional attributes.
Based on the performed analysis we have classified the systems by different criteria, including presence or absence of particular functions or attributes. In the course of a comparative study we have investigated basic system functionality (archive storage, deduplication, geo-replication etc.) and system performance (system scalability limits, architecture, operating environment etc.). In addition, we analyzed safety mechanisms and system self-management tools.
Based on the analysis data and the classification of the systems we have proposed methods for distributed data storage systems selection. The results of this work may be used by researchers and practitioners to make a justified choice of a storage systems for their specific needs.
Proceedings of the Institute for System Programming, vol. 27, issue 6, 2015, pp. 225-252.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2015-27(6)-15Full text of the paper in pdf Back to the contents of the volume