Hi Fabio,
After an in-depth investigation, I can confirm that the behavior you're observing is definitely not due to uninitialized variables. The most likely explanation is the use of non-deterministic summation in reduction operations, which is a known behavior when running floating-point operations with OpenMP parallelization.
In OpenMP, when large arrays or matrices are reduced (e.g., summing elements of the density or Fock matrices), the order of operations can change depending on how the threads are scheduled at runtime. Since floating-point addition is not associative, this can lead to slightly different results depending on the number of threads, their distribution, or the memory layout at runtime, even if the input is exactly the same.
To confirm this, I ran your script using the public OpenMP build, with both 16 threads and a single thread. Results are reported below.
- In the parallel run, there are slight differences in the final energy, but the magnitude of the difference is negligible, well within the SCF convergence tolerance.
- In the single-thread run (i.e., serial execution with OpenMP), the results are fully reproducible, with no differences at all.
As expected, the more threads you use, the larger these differences may become, although they typically remain very small.
Increasing the number of threads likely increases the occurence for these small numerical differences, although they typically remain insignificant in practice. For this reason, we recommend limiting the number of OpenMP threads to 4, and primarily use MPI for more extended parallelism, which generally offers a better balance between performance and numerical stability.
Additionally, it's good practice in our workflow to run each job in a clean, isolated SCRATCH folder, to avoid any unintended interference from leftover files or cached data. Your current script doesn’t do this, so we recommend updating it to clean or define a separate working directory for each run.
Energies at each SCF Cycle - Copper bulk
-----------------------------------------------------------------------------
OpenMP 16 th
-----------------------------------------------------------------------------
run 1 run 2 diff
CYC 0 -1.959786174741E+02 -1.959786174741E+02 0.0
CYC 1 -1.959830545289E+02 -1.959830545289E+02 0.0
CYC 2 -1.959842257372E+02 -1.959842257372E+02 0.0
CYC 3 -1.959843727892E+02 -1.959843727892E+02 0.0
CYC 4 -1.959844091796E+02 -1.959844091796E+02 0.0
CYC 5 -1.959844090374E+02 -1.959844090374E+02 0.0
CYC 6 -1.959844090603E+02 -1.959844077969E+02 -1.2634000086109154e-06
CYC 7 -1.959844090705E+02 -1.959844090714E+02 9.000018508231733e-10
CYC 8 -1.959844090772E+02 6.700020094285719e-09
CYC 9 -1.959844090706E+02 1.000159954855917e-10
-----------------------------------------------------------------------------
Open MP 1 th
-----------------------------------------------------------------------------
run 1 run 2 diff
CYC 0 -1.959786174741E+02 -1.959786174741E+02 0.0
CYC 1 -1.959830545289E+02 -1.959830545289E+02 0.0
CYC 2 -1.959842257372E+02 -1.959842257372E+02 0.0
CYC 3 -1.959843727892E+02 -1.959843727892E+02 0.0
CYC 4 -1.959844091796E+02 -1.959844091796E+02 0.0
CYC 5 -1.959844090374E+02 -1.959844090374E+02 0.0
CYC 6 -1.959844090603E+02 -1.959844090603E+02 0.0
CYC 7 -1.959844090705E+02 -1.959844090705E+02 0.0