Germline Short Variant Discovery and Annotation Pipeline using GATK Tool

Authors

  • Ashraf Bourawy Department of Computer Science, Faculty of Science, Omar AL-Mukhtar University, Albayda, Libya https://orcid.org/0000-0001-8161-9291
  • Abdalmunam Abdalla Department of Computer Science, Faculty of Science, Omar AL-Mukhtar University, Albayda, Libya

Keywords:

Germline, Variant discovery, GATK, Variant Annotation.

Abstract

Background and aims. Identifying of variants related to genetic diseases has become affordable with the advances in whole genome sequencing (WGS) enabled by the enhancements of next-generation sequencing technology. Germline and somatic variants are discovered with the help of bioinformatics pipelines utilizing specialized tools. However, the performance and workflow of these tools are subject to evaluation. The aim of this study is to investigate the pipeline of the Genome Analysis Toolkit (GATK) tool in discovering and annotating germline short variants. In particular, this study aimed at variant calling of single nucleotide polymorphisms (SNPs) and short insertions and deletions (Indels). Methods. To accomplish our aim, several tools and packages are used in variant calling workflow. The focus of this study is on evaluating the GATK tool. There are some tools associated to work along with GATK, such as PICARD package. In addition, other tools are needed for data preprocessing before the GATK tool is applied, such as BWA and SAMTools. The human reference genome is used for mapping (alignment) purpose. Paired-end sequence reads are used as the subject of discovering germline variants. Different methods are followed in this study concerning data preprocessing, variants discovery, and variants filtering and annotation. Results. The data preprocessing steps have revealed good quality of the sequences, base quality scores, and adapter contents. These results have indicated that the sequence reads possess good quality and given the approval to proceed with downstream workflow. Using the GATK tools, variants calling has been performed where SNPs and Indels were obtained in two separate files. Filtration and annotation are applied on the discovered variants and an Excel file was obtained. This file contains the found variants which were generated by comparison with the aid of data sources from well-known databases. Conclusion. Upon completing the GATK pipeline, germline variants (SNPs and Indels) were discovered and an Excel file was produced with all information. Further analysis can be performed by specialized scientists in a convenient manner.

Downloads

Published

2023-08-07

How to Cite

1.
Ashraf Bourawy, Abdalmunam Abdalla. Germline Short Variant Discovery and Annotation Pipeline using GATK Tool. Alq J Med App Sci [Internet]. 2023 Aug. 7 [cited 2024 Dec. 22];:424-32. Available from: https://uta.edu.ly/journal/index.php/Alqalam/article/view/318

Issue

Section

Articles