Developing scalable software infrastructure for data storage and processing for computational biology problems.
This article is an overview of scalable infrastructure for storage and processing of genome data in genetics problems. The overview covers used technologies descriptions, the organization of unified access to genome processing API of different underlying services. The article also covers methods for scalable and cloud computing technologies support. The first service in virtual genome processing laboratory is provided and presented. The service solves transcription factors bindning sites prediction problem. The main principles of service construction are provided. Basic requirements for underlying comptutaion software in virtual laboratory environments are provided. Overview describes the implemented web-service (https://api.ispras.ru/demo/gen) for transcription factors binding site prediction. Provided solution is based on ISPRAS API project as an API gateway and load-balancer; the middle-ware task-manager software for pool of workers support and for communications with Openstack infrastructure; OpenZFS as an intermediate storage with transparent compression support. The described solution is easy to extend with new services fitting the basic requirements.
Proceedings of the Institute for System Programming, vol. 26, issue 4, 2014, pp. 45-54.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2014-26(4)-4Full text of the paper in pdf (in Russian) Back to the contents of the volume