G.3 Installation

Compiling the FHI-aims executable with support for GPU acceleration requires CMake 3.8. A set of GPU acceleration flags should be included in the CMake initial cache file. The installation procedure is otherwise the same as a standard FHI-aims installation. We have only tested the GPU code for the scalapack.mpi target.

G.3.1 Example initial_cache.cmake file for GPU Acceleration

Note The example below, created for ‘node18‘ of the Timewarp cluster at Duke university, covers the GPU-related flags in initial_cache.cmake. It may be used as a template for the user’s initial_cache.cmake. We encourage users to put their own GPU acceleration compilation flags on the “Known compiler settings” section of the FHI-aims GitLab wiki (https://aims-git.rz-berlin.mpg.de/aims/FHIaims/wikis).

Architecture of Timewarp ‘node18‘:

  • CentOS Linux 7

  • Intel Xeon Silver 4114 CPU 2.20GHz

  • NVIDIA Titan V100

  • Intel Fortran 18.0.2

  • CUDA 10.1

  • CMake 3.17.3

initial_cache.cmake:

set(CMAKE_C_COMPILER icc CACHE STRING "")
set(CMAKE_C_FLAGS "-O3 -ip -fp-model precise -DNDEBUG -std=gnu99" CACHE STRING "")
set(CMAKE_Fortran_COMPILER mpiifort CACHE STRING "")
set(CMAKE_Fortran_FLAGS "-O3 -ip -fp-model precise" CACHE STRING "")
set(Fortran_MIN_FLAGS "-O0 -fp-model precise" CACHE STRING "")

set(USE_CUDA ON CACHE BOOL "")
set(CMAKE_CUDA_FLAGS "-O3 -DAdd_ -arch=sm_70" CACHE STRING "")

set(LIB_PATHS "$ENV{MKLROOT}/lib/intel64 $ENV{CUDA_HOME}/lib64" CACHE STRING "")
set(LIBS "cublas cudart mkl_scalapack_lp64 mkl_blacs_intelmpi_lp64
    mkl_intel_lp64 mkl_sequential mkl_core" CACHE STRING "")
  • How to determine the -arch or -gencode CUDA flags? Your CUDA installation should come with a utility called deviceQuery, which is located in
    samples/1_Utilities/deviceQuery in the CUDA root directory. Copy that directory into your work directory, enter, and build. If you get any build errors, edit the Makefile accordingly. When successful, run the executable deviceQuery. The line “CUDA Capability Major/Minor version number” contains the relevant information. For example, if it says 6.0 then use sm_60 with the -arch or -gencode flags.

  • How to choose between multiple GPU cards installed on the system? Use the environment variable CUDA_VISIBLE_DEVICES.

  • When using gfortran with CMake version 3.8 and <3.11, it can happen that the executable wants to link to the wrong libgfortran library at runtime. This can be prevented by pointing CUDA_LINK_DIRS to directories that contain the CUDA libraries (e.g., “/opt/cuda/lib64/stubs /opt/cuda/lib64”). For more information, see this: gitlab.kitware.com/cmake/cmake/issues/17792.

G.3.2 Example initial_cache.cmake file when using HIP (EXPERIMENTAL!)

This subsection gives an example case of initial_cache.cmake when using HIP. This example was tested on node ‘nid005000‘ of LUMI clusters GPU Early Access Platform (EAP). Compiling the FHI-aims executable with HIP support requires CMake 3.21. This example uses ROCm v5.1.4.

Architecture of LUMI EAP ‘nid005000‘:

  • SLES 15.3

  • AMD EPYC 7A53 CPU 2.1GHz

  • AMD MI250

  • GNU Fortran 11.2.0

  • HIP 5.1.4

  • CMake 3.23.2

initial_cache.cmake:

set(CMAKE_Fortran_COMPILER "ftn" CACHE STRING "")
set(CMAKE_Fortran_FLAGS "-O2 -ftree-vectorize -funroll-loops
     -fallow-argument-mismatch -ffree-line-length-none" CACHE STRING "")
set(Fortran_MIN_FLAGS "-O0 -ffree-line-length-none" CACHE STRING "")
set(CMAKE_C_COMPILER "cc" CACHE STRING "")
set(CMAKE_C_FLAGS "-O2" CACHE STRING "")

set(USE_HIP ON CACHE BOOL "")
set(SET_HIP_ARCH "gfx90a" CACHE STRING "" FORCE)
set(CMAKE_HIP_FLAGS "-O2 -DAdd_ -std=gnu++17 " CACHE STRING "")
set(HIP_LINK_DIRS "$ENV{ROCM_PATH}/llvm/lib" CACHE STRING "")

set(LIBS "hipblas" CACHE STRING "")
  • By default CMake tries to find Clang compiler. If user wants to use some other compiler for example hipcc, then this person can add the following line in to initial_cache.cmake:

    set(SET_HIP_COMPILER "hipcc" CACHE STRING "" FORCE)
    
  • Do not use CMAKE_HIP_COMPILER to set the used HIP compiler, it is not supported in the FHI-aims CMake script. Also the HIP architecture must be set by setting SET_HIP_ARCH, otherwise there can be an error.

  • The line which sets HIP_LINK_DIRS prevents the compiler from linkin against an old version of libgfortran.

  • Sometimes amdhip64 should be added to LIBS.

  • There is not yet HIP support for ELPA eigensolver.

  • Experience has shown that the default batch sizes of 200 could be too small to get the best performance from AMD hardware. In the case of LUMI cluster’s MI250 GPU:s setting the values of batch_size_limit and points_in_batch as large as 1200 seems to give the best performance. For example, this did decrease the integration time to one third when comparing to the results got by using the deafault batch size values.

WARNING This is an experimental release and there are still known issues to be solved. This code is not advised to be used for production runs. Check your numbers!