
Variant calling is a fundamental process in genomics that aims to identify and characterize genetic variations within a given sample. These variations, known as variants, can be single nucleotide polymorphisms (SNPs), insertions, deletions, or structural rearrangements. By pinpointing these variants, researchers gain insights into the genetic diversity within a population, identify disease-causing mutations, and unravel the complex mechanisms underlying various biological processes.
Perspective from Plant Genetics
In the realm of plant genetics, variant calling plays a pivotal role in understanding the genetic basis of important traits, such as disease resistance, yield, and nutritional content. By identifying variants associated with these traits, breeders can develop improved crop varieties through targeted breeding programs. Additionally, variant calling enables the study of evolutionary processes in plants, shedding light on their adaptation to different environments and their response to climate change.
Programming and Software Available for Variant Calling
The field of variant calling heavily relies on the use of advanced programming platforms and software. These tools employ sophisticated pipelines to analyze high-throughput whole genome sequencing data and accurately identify genetic variants. Programming languages such as Python and R have become indispensable for variant calling pipelines in such tools. These languages provide researchers with the flexibility to customize their analyses, integrate various tools, and automate the variant calling process. The open-source nature of these languages also fosters collaboration and allows for the sharing of best practices and novel pipelines within the scientific community.
Various sets of programming tools are available for the benchmark analysis pipeline. Genome Analysis Toolkit (GATK) is the most popular pipeline providing researchers with a comprehensive suite of tools for variant calling. Although GATK is the most popular, there are other pipelines that could outperform it. Specifically, DeepVariant (Poplin et al., 2018), and Illumina’s DRAGEN outperforms GATK and other pipelines particularly in difficult-to-call genomic regions with more computational efficiency, more accuracy, and less error (Betschart et al., 2022). All these tools not only allow for the detection of variants but also enable downstream analyses, such as variant annotation, functional prediction, and population genetics.

Figure: Pipeline for benchmark analysis. The input raw sequence file in. fastq format is mapped and aligned to a reference genome resulting in a .bam file. The recalibrated .bam file is used for variant calling resulting in .vcf file.
Challenges and Future Directions
While variant calling has revolutionized the field of genomics, it is not without its challenges. The complexity of the genome, the presence of sequencing errors, and the vast amount of data generated pose significant computational and analytical hurdles. Researchers are continually developing new pipelines and refining existing ones to improve the accuracy and efficiency of variant calling. In the future, variant calling is expected to benefit from advancements in technology, such as third generation sequencing. These technologies will provide researchers with a more comprehensive view of the genome and enable the detection of complex and rare variants that were previously challenging to identify.
Conclusions
Variant calling is a powerful method that has transformed our understanding of genetic variation. With the availability of sophisticated software and programming tools, researchers can now delve deeper into the intricate world of genetic variation and uncover the secrets hidden within our genomes. As we continue to refine our methods and embrace new technologies, variant calling will undoubtedly remain at the forefront of genomics research, paving the way for groundbreaking discoveries and advancements in various fields.
References
Betschart, R.O., Thiéry, A., Aguilera-Garcia, D., Zoche, M., Moch, H., Twerenbold, R., Zeller, T., Blankenberg, S., Ziegler, A., 2022. Comparison of calling pipelines for whole genome sequencing: an empirical study demonstrating the importance of mapping and alignment. Sci. Rep. 12, 21502. https://doi.org/10.1038/s41598-022-26181-3
Poplin, R., Chang, P.-C., Alexander, D., Schwartz, S., Colthurst, T., Ku, A., Newburger, D., Dijamco, J., Nguyen, N., Afshar, P.T., Gross, S.S., Dorfman, L., McLean, C.Y., DePristo, M.A., 2018. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987. https://doi.org/10.1038/nbt.4235


