Processing of Raw Astronomical Data of Large Volume by MapReduce Model
Exponential grow of volume, increased quality of data in current (SDSS, DES, PanSTARRS) and incoming sky surveys (LSST) open new horizons for astrophysics but require new approaches to data processing especially big data technologies and cloud computing. This work presents a MapReduce-based approach to solve a major and important computational task in astrophysics - raw astronomical image data processing. We present architecture of Hadoop-based astrophysical pipeline which combines following steps of data processing: background removal, projection, co-addition, PSF-modelling, sky objects features extraction from images. The architecture uses modern implementations of astrophysical image processing algorithms from software packages SWarp, PSFEx, SExtractor. These tools are integrated in MapReduce procedures. The pipeline steps are joined in two phases. First phase - "raw" data processing - includes background removal, projection and images co-edition. Results of the first phase are preprocessed and co-added images into so called cells. Cells interleave by borders. Interleavings help us to process correctly on the second stage large sky objects on borders. The second stage includes steps of PSF-modelling and creation of the sky catalogue by extraction of sky object properties from cells. Experiments showed linear scalability of all processing steps and small impact of Hadoop infrastructure on entire performance costs. We used one filter data (red) from the Stripe82 dataset. All experiments are made inside cloud platform Microsoft Azure HDInsight.
Proceedings of the Institute for System Programming, vol. 27, issue 6, 2015, pp. 315-334.
ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).
DOI: 10.15514/ISPRAS-2015-27(6)-20Full text of the paper in pdf (in Russian) Back to the contents of the volume