Modification of the short read alignment algorithm to improve the quality of the human whole genome sequencing data processing pipeline


Modification of the short read alignment algorithm to improve the quality of the human whole genome sequencing data processing pipeline

Egor Pavlovich GUGUCHKIN, Evgeny Andreevich KARPULEVICH

Abstract

This study emphasizes the importance of aligning short reads in the analysis of human whole-genome sequencing data. The alignment process involves determining the positions of short genetic sequences relative to a known reference genome sequence of the human genome. Traditional alignment methods use a linear reference sequence, but this can lead to incorrect alignment, especially when short reads contain genetic variations. In this work, the index file of the reference sequence was modified using the minimap2 tool. Experimental results showed that adding information about frequently occurring genetic variations to the minimap2 index increases the number of correctly identified genetic variants, which affects the quality of subsequent data analysis.

Keywords

data processing pipeline, DNA sequencing, Computational biology, Sequence alignment methods, NGS data analysis, Computational methods

Edition

Proceedings of the Institute for System Programming, vol. 35, issue 2, 2023, 235-248

ISSN 2220-6426 (Online), ISSN 2079-8156 (Print).

DOI: 10.15514/ISPRAS-2023-35(2)-17

For citation

Egor Pavlovich GUGUCHKIN, Evgeny Andreevich KARPULEVICH Modification of the short read alignment algorithm to improve the quality of the human whole genome sequencing data processing pipeline. Proceedings of the Institute for System Programming, vol. 35, issue 2, 2023, 235-248 DOI: 10.15514/ISPRAS-2023-35(2)-17.

Full text of the paper in pdf (in Russian) Back to the contents of the volume