Charalabos Theodoris, "Heterogeneous computing for large-scale linkage-disequilibrium analyses on the Aris supercomputer", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2020
https://doi.org/10.26233/heallink.tuc.87871
Linkage disequilibrium (LD) is the non-random association between alleles at different loci. In the field of Genomics, due to several breakthroughs in DNA extraction and sequencing technologies, huge databanks of genomic data have been created, and continue to grow every day. Along with said data, grows the need for a highly-performing solution in analyzing them. The prevailing analysis method of calculation for the LD in genomes uses single nucleotide polymorphisms (SNPs) to detect the absence and/or presence of minor alleles. Most software implementations to-date are not yet capable to efficiently manage the expected time and memory requirements of future large-scale genomic analyses. To answer the need for fast, scalable genomic analysis, we engineered and created a standalone software, qLD (quickLD) https://github.com/StrayLamb2/qLD. qLD relies on prior observations that a high-performance approach on LD can utilize general matrix multiplications. Therefore, existing optimized computational kernels that calculate LD are employed. Alongside the optimized kernels, qLD applies memory-aware techniques to lower memory requirements and parallel execution using both CPU and GPU to reduce execution times even more. qLD in single-thread execution delivers up to 28x faster processing than the current state-of-the-art software implementation when run on the same CPU and up to 44x when the computation is offloaded to a GPU. When used in multi-threaded executions, we observed speedups of up to 60x against the same state-of-the-art software, employing the same number of threads. qLD also addresses a missing feature of state-of-the-art tools, the ability to quantify allele associations between arbitrarily distant loci, thereby facilitating the evaluation of long-range LD and the detection of co-evolved genes. We showcase qLD on the analysis of 22,554 complete SARS-CoV-2 genomes.