Best posts made by GiacomoAmbrogio

undefined

The CRYSTAL code will have a dedicated full day at the upcoming Graduate School on Electronic Structure Theory at STFC Daresbury Laboratory, taking place 24–28 February 2025!

This event is a great opportunity to:

Dive deep into the fundamentals of electronic structure theory.
Gain hands-on experience with CRYSTAL, alongside other leading software packages like CASTEP and QUESTAAL.
Learn both command-line and ASE (python based) interfaces.
Connect with peers and experts in the field.

Whether you're a doctoral researcher, postdoc, or someone looking to enhance your expertise in electronic structure codes, the dedicated CRYSTAL day offers a valuable opportunity to develop a solid understanding of the code's features, gain practical experience, and understand how it fits into the broader landscape of computational tools.

Event Details
Dates: 24-28 February 2025
Application deadline: 19 January 2025.
Venue: Science and Technology Facilities Council, Daresbury Laboratory, UK.
Apply Now: https://ccp9.ac.uk/graduate_school_2025/

Take advantage of this opportunity to expand your knowledge and experience with the CRYSTAL code in a collaborative and dynamic environment. We look forward to seeing you there!

undefined

Hi Antonio,
Unfortunately, I don't think there is any specific print statement in the .out file that indicates if the calculation was run using either SDFT or SCDFT formalism. Maybe in newer versions of the code, we will add more information about this!

I guess the only way to tell is by looking at the input file. A good practice could be to print the input file at the top of the output in your launch script, so that you always have a reference to your input for each output you generate.

undefined

In any case, you can take a look at this link. Here, you will find instructions on how to install the parallel version of CRYSTAL, both from executable and object files (see paragraphs 2 and 3). The instructions refer to the latest version of the code, which is CRYSTAL23, but I think the procedure is similar for CRYSTAL17.

Another useful reference is the tutorial on "How to run", see overview and parallel run sections.

Hope this helps!

undefined

Dear CRYSTAL community,

We’re excited to share our recent work on accelerating linear algebra operations in the CRYSTAL code using GPUs. Our implementation boosts the performance of self-consistent field (SCF) calculations by offloading key matrix operations like multiplication, diagonalization, inversion, and Cholesky decomposition to GPUs.

In the manuscript, we first analyze the performance and limitations of the standard parallel version of the code (Pcrystal) and then we evaluate the scalability of the new GPU-accelerated approach with 1 to 8 GPUs, observing remarkable scaling. To highlight these improvements, we present benchmark results on different systems, such as the example below.

We expected significant speedups for large systems due to the limited number of k points, each requiring substantial computational effort. To ensure a fair comparison, we ran calculations using the massively parallel version of CRYSTAL (MPPcrystal) on a large MOF structure with over 30000 basis functions. Surprisingly, a single GPU on one node performed comparably to 512–1024 CPU cores running across 4–8 nodes.

To find out more, read the full paper here.

We aim to make this GPU-accelerated version of CRYSTAL available in the upcoming release, allowing all users to benefit from its enhanced performance for large-scale simulations. We look forward to reading your thoughts and discussing potential applications or further improvements.

A big thanks to Lorenzo Donà, Chiara Ribaldone, and Filippo Spiga for their contributions to the development of this code!

undefined

Hi othmen1983,
What you need to do next depends on whether you have a Pcrystal executable or the OBJ files to compile.

If you have the executable, you can check if it runs correctly on your machine by downloading the input file INPUT and placing it in an empty folder (ensure the file is named INPUT, without any extension).
After that you can run this command in the same folder:

mpirun -np 2 path/to/crystal/exe/Pcrystal

If you see the output correctly on your screen, it means the executable is working as intended. The final step is to create launch scripts to manage input, output, and temporary files.

Example scripts for CRYSTAL23 are available here (they should also work with CRYSTAL17). A brief explanation on how to use them can be found in the How to run tutorial.

Unfortunately, these scripts only work on simple machines, such as workstations. If you need to run Pcrystal on a cluster with a queuing system like Slurm, you will need dedicated scripts.

undefined

Hi Danny,

From a first look at your input file, I noticed that you are performing a combined PDOS and BANDS calculation. This has not been fully tested as we prefer to run two separate jobs, one for PDOS and one for BANDS. Once the harmonic frequencies are computed, they can both be run through restarts and are both very fast.

Indeed, the PDOS calculation produces a single set of data if run without the BANDS keyword.

I therefore suggest to split your calculation in two different ones: one for PDOS and one for BANDS. You can use the RESTART keyword in the FREQCALC block to avoid recomputing the Hessian matrix (see page 219 of the User manual).

I leave here a link to a tutorial webpage that we have recently updated about computing phonon-related properties with CRYSTAL.

Let me know if this helps!

undefined

Hi job314,
After looking into your issue further, I’m following up with more information.

The B973C functional is a composite method with built-in corrections specifically designed for the mTZVP basis set. Modifying the basis set can introduce errors and is not the right approach. This method and basis set were primarily developed for molecular systems and, at most, molecular crystals, not bulk materials like yours.

Explicit warnings about this functional can be found in the user manual on page 161.

Given this, I recommend choosing a different functional and basis set better suited for your system.

undefined

Hi R.Zosiamliana,

When using SLABCUT from a 3D system, or directly a SLAB input, CRYSTAL treats the resulting system as a lower-dimensional 2D structure. Specifically, the system retains periodicity only along two directions (conventionally in the x-y plane), while periodicity is inherently absent along the orthogonal direction (z).

Just to preserve the same output format as for a 3D calculation, an arbitrary value of 500 Å is printed in the output for the c lattice vector, even if it does not exist for such calculations. We understand this may be confusing and we are considering to remove it in future versions.

To summarise, in CRYSTAL when running a 2D (or 1D, or 0D) calculation, there is no need to repeat the structure along the non-periodic direction and thus there is no need to define a vacuum.

undefined

Hi piquini,
You could try to add this to your mpirun line:

mpirun -np 4 $CRY23_EXEDIR/$VERSION/Pcrystal |& tee $PBS_O_WORKDIR/${PBS_JOBNAME}.out

Let me know if it works!

undefined

Hi heimurinn,

The error you're seeing is not actually related to the symmetry adaptation of the Bloch functions. What’s happening is that the error is triggered by another processor before the one responsible for writing the output reaches the actual failure point.

If you run the same calculation on one single process (serial mode), you should see the real error, which (for 81_from1.d12, I didn’t try 81_to1403845.d12, but I suspect it will be the same) is:
ERROR **** GROTA1 **** ERROR IN SYMMETRY EQUIVALENCE - CHECK INPUT COORDINATES

This issue is likely related to symmetry. One thing you can try is to follow the structure standardization procedure described in this thread.

If you have any questions or need further help, feel free to ask!

undefined

Hi ywang,
If you are running the parallel version of CRYSTAL, could you try run it on one single processor?
Sometimes, when running in parallel, an error can terminate the job before the output is printed, especially during the input reading section.

undefined

Hi Job,
It seems there might be a bit of confusion regarding how to report MPI bindings, maybe we will update the tutorial to make it more clear.

To display binding information correctly, you’ll need to include the appropriate flag in your usual mpirun command. For example:

If you're using OpenMPI:

mpirun --report-bindings -np 192 Pcrystal

If you're using Intel MPI:

mpirun -print-rank-map -np 192 Pcrystal

To check which MPI implementation you're using, you can inspect the loaded modules in your environment, or simply run:

mpirun --version

If you're using a different MPI implementation, feel free to let me know. I'd be happy to help you find the right way to print the bindings.

undefined

Hi job314,

I tried to run your input (without the RESTART keyword), and it seems to work fine.
Can you double-check the file used for the restart? Or eventually, can you try run the code without RESTART?

Anyway, the proper way to perform a spin-polarized calculation in DFT is to use the SPIN keyword in the DFT block instead of UHF (but technically, both should work):

[...]
DFT
SPIN
R2SCAN
XLGRID
ENDdft
[...]

undefined

Hi ywang,

We managed to resolve the problem with your input file. The issue was that the system was not standardized according to the space group.

It should be possible to tell the program that the structure is not standard through a specific keyword. However, an alternative approach is to manually modify the CIF file. This can be done using VESTA, please refer to the image.

The new input geometry look like this:

Fe_complex
CRYSTAL
0 0 0
225
26.7506
6
8  0.05353    0.18260    0.24189
26 0.50000    0.21143    0.21143
6  0.20297    0.20297    0.07018
6  0.17805    0.17805    0.11438
6  0.13562    0.13562    0.19929
1  0.12115    0.12115    0.22832
ENDgeom

Fe.d12

This has also reduced the number of symmetry-irreducible atoms, so the calculation should be a bit faster.

undefined

We ran some tests regarding the inclusion of the keyword NOSYMADA in relation to anisotropic shrinking factors, and I can confirm that we did not find any significant difference. Therefore, anisotropic shrinking is fully compatible with symmetry-adapted block functions.

Taking a closer look to the data and input/output files you submitted, the convergence with respect to the shrinking factors appears to be very similar. Comparisons of calculations using the same SHRINK value consistently show differences well within the default tolerance (10^-6 Ha/cell), except for one value (highlighted in red in the screenshot). In this case, the energy seems to be significantly off the trend. Could you please double-check that specific calculation?

Screenshot 2025-04-16 112322.png

undefined

Hi Yachao_su,

The relation you mention between MPI processes and efficiency is correct, but a bit of clarification might be necessary:limitations on efficiency apply only to the linear algebra steps and the density matrix construction (specifically, the FDIK and PDIG steps) during the SCF procedure.

In the FDIK section we are essentially diagoalizing a set of Fock matrices (of size n atomic orbitals x n atomic orbitals), one for each k point. When using symmetry-adapted Bloch functions, these matrices become block-diagonal, and each block corresponds to an irreducible representation (irrep). These blocks can then be diagonalized independently (by different MPI processes).

Diagonalization scales cubically with matrix size. However, for small systems, this time is usually negligible compared to other SCF steps. So in such cases, there's no need to worry too much about how many MPI processes you're using.

The other most time-consuming part is the computation of two-electron integrals (SHELLXN step), which scales almost linearly with system size and is not affected by the MPI-per-irrep limitation.

The limitation becomes relevant when dealing with large systems, where the irreps (ie, the block sizes) are large. In that case, the time spent on FDIK and PDIG becomes significant. Since Pcrystal cannot assign more than one MPI process per irrep, any additional processes will stay idle during diagonalization, but they are still used in other parts of the calculation.

Here, I've reported an example. You can see the blue line (FDIK, i.e., diagonalization) is negligible on the left (small systems) but becomes dominant on the right (larger systems). So the MPI limitation becomes noticeable only in that region.

If you're on the right side of the plot, where the time spent in FDIK exceeds that of other steps like SHELLXN or NUMDFT (DFT integration), you should limit your number of MPI processes using the rule:

$$ \text{max MPI processes} = \sum_\mathbf{k} n_{irreps, \mathbf{k}} \times n_{spin}$$

That is, the total number of processes should not exceed the sum of all irreps across all k points, multiplied by the number of spins (2 if you're doing a spin-polarized calculation, 1 for close shell).

By default, the number of irreps is not printed in the output file. However, you can activate some advanced printing options to display this information. To do so, insert the following into the third block of your input file:

KSYMMPRT
SETPRINT
1
47 nk

Here, nk is the maximum number of k points for which you want to print symmetry information. Likely this will be the number of k points in your calculation.

You can find some reference about this option in the CRYSTAL User Manual, at pages 117-118.

The output will look like this:

 +++ SYMMETRY ADAPTION OF THE BLOCH FUNCTIONS +++

 SYMMETRY INFORMATION:
 K-LITTLE GROUP: CLASS TABLE, CHARACTER TABLE.
 IRREP-(DIMENSION, NO. IRREDUCIBLE SETS)
 (P, D, RP, RD, STAND FOR PAIRING, DOUBLING, REAL PAIRING AND REAL DOUBLING
 OF THE IRREPS (SEE MANUAL))

 K[   1] (  0  0  0)                                                    <------ 1st k point

 CLASS | GROUP OPERATORS (COMPLEX PHASE IN DEGREES)
 ------------------------------------------------------------------------
   2   |   2(   0.0);
   3   |   3(   0.0);   4(   0.0);
   4   |   5(   0.0);   6(   0.0);
   5   |   7(   0.0);   8(   0.0);   9(   0.0);
   6   |  10(   0.0);  12(   0.0);  11(   0.0);

 IRREP/CLA     1     2     3     4     5     6
 ---------------------------------------------
  MULTIP |     1     1     2     2     3     3
 ---------------------------------------------
      1  |  1.00  1.00  1.00  1.00  1.00  1.00
      2  |  1.00 -1.00  1.00 -1.00 -1.00  1.00
      3  |  2.00  2.00 -1.00 -1.00  0.00  0.00
      4  |  1.00  1.00  1.00  1.00 -1.00 -1.00
      5  |  2.00 -2.00 -1.00  1.00  0.00  0.00
      6  |  1.00 -1.00  1.00 -1.00  1.00 -1.00

   1-(1,  27);   2-(1,  25);   3-(2,  37);   4-(1,  23);   5-(2,  37);  <------ This is the information about
   6-(1,  25);                                                          <------ the irreps

 K[   2] (  1  0  0)                                                    <------ 2nd k point.

 CLASS | GROUP OPERATORS (COMPLEX PHASE IN DEGREES)
 ------------------------------------------------------------------------
   2   |  11(   0.0);

 IRREP/CLA     1     2
 ---------------------
  MULTIP |     1     1
 ---------------------
      1  |  1.00  1.00
      2  |  1.00 -1.00

   1-(1, 126);   2-(1, 122);                                             <------ irreps of 2nd k point
....

For example, here the first k point is adapted in 6 different irreps, while the second in 2

Please note: this feature should be run using one single MPI process.

Giacomo Ambrogio

Posts