GPU-Accelerated Linear Algebra for Large-Scale DFT with CRYSTAL

GiacomoAmbrogio

Dear CRYSTAL community,

We’re excited to share our recent work on accelerating linear algebra operations in the CRYSTAL code using GPUs. Our implementation boosts the performance of self-consistent field (SCF) calculations by offloading key matrix operations like multiplication, diagonalization, inversion, and Cholesky decomposition to GPUs.

In the manuscript, we first analyze the performance and limitations of the standard parallel version of the code (Pcrystal) and then we evaluate the scalability of the new GPU-accelerated approach with 1 to 8 GPUs, observing remarkable scaling. To highlight these improvements, we present benchmark results on different systems, such as the example below.

We expected significant speedups for large systems due to the limited number of k points, each requiring substantial computational effort. To ensure a fair comparison, we ran calculations using the massively parallel version of CRYSTAL (MPPcrystal) on a large MOF structure with over 30000 basis functions. Surprisingly, a single GPU on one node performed comparably to 512–1024 CPU cores running across 4–8 nodes.

To find out more, read the full paper here.

We aim to make this GPU-accelerated version of CRYSTAL available in the upcoming release, allowing all users to benefit from its enhanced performance for large-scale simulations. We look forward to reading your thoughts and discussing potential applications or further improvements.

A big thanks to Lorenzo Donà, Chiara Ribaldone, and Filippo Spiga for their contributions to the development of this code!