N. Alachiotis, P. Skrimponis, M. Pissadakis and D. Pnevmatikatos, “Scalable phylogeny reconstruction with disaggregated near-memory processing,” ACM Trans. Reconfigurable Technol. Syst., vol. 15, no. 3, 2022, doi: 10.1145/3484983.
https://doi.org/10.1145/3484983
Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.