1.6 Compiling faster versions of FHI-aims on specific platforms

FHI-aims is intended to be a Fortran-only code, which – for most of the code – means that building the “fastest” version of FHI-aims on a given computer architecture is “only” a matter of finding the right Fortran compiler and compiler options for that processor. For some architectures, specific compiler options are collected in the FHI-aims club wiki – please check there and please add any useful information that you may find.

That said, one particular performance-critical area for large systems is the Kohn-Sham eigenvalue solver. In FHI-aims and on parallel computers, this problem is solved by the ELSI infrastructure and the ELPA library. ELPA, in fact, allows its users to specify specific, platform-optimized so-called linear algebra “kernels.” By default, FHI-aims uses a generic kernel which will compile with any Fortran compiler and will give reasonable speed. However, if one knows which specific computer chip one is using, it is possible to substitute this kernel with an architecture specific kernel and compile a faster version of ELPA into FHI-aims. This is possible, for example, for the BlueGene/P, BlueGene/Q, Intel AVX and several other Intel architectures. For standard Intel x86 chips, there is even an “assembler” based kernel that will get fast performance regardless of the Fortran compiler above.

Note that this choice can matter. For example, the “generic” ELPA kernel will produce fast code for the Intel Fortran compiler, but much slower code with certain versions of the PGI Fortran compiler (often found on Cray machines).

At this time, please ask (see below) about the most effective strategy to link against the “best” ELSI and ELPA libraries. Ideally, this will require a user to build a separate (standalone) instance of ELPA and of ELSI first. This can be very worthwhile.

In the case of using an external ELPA build, it is very important to build it on the exact node used for computation. In ELPA versions 2023 and later, it has been noticed that not doing so results in numerical inconsistencies in calculations.