4.1 Ground state DFT: Total energies and relaxation

As a first and basic example, we discuss how to set up a simple DFT total-energy calculation for a given structure in FHI-aims. We here expand on the example provided as a testrun: The relaxation of a H2O molecule towards its minimum-energy structure. The relevant input files geometry.in and control.in are included in the directory testcases/H2O-relaxation (see Chapter 1).

The starting geometry here is badly distorted H2O, with an initial bond angle of 90. For any relaxation runs, we strongly recommend a two-step procedure: First, pre-relax the structure with light settings down to (say) 102 eV/Å or so, and only then follow up with a post-relaxation run using tight settings or anything else. light calculations may easily be cheaper by a factor of 5-10 than tight ones, and going down a relaxation trajectory of any length can then be a trememdous waste of computational resources.

The test case below only includes the quick but safe prerelaxation step, leading to an improved geometry that is the optimum using light settings. Based on this resulting geometry, the postrelaxation step with tight settings should be simple follow-up exercise.

Input files

Turning first to geometry.in, we see that the basic geometry input for FHI-aims is very simple in most cases: atom lines that contain nuclear coordinates in Å, together with the appropriate species designation (in this case, H and O). For a spin-polarized calculation, one might additionally want to specify initial spin moments for selected atoms using the initial_moment keyword, in order to define a good initial spin density guess for the s.c.f. procedure.

The input file control.in contains all necessary computational information regarding the desired run. Most importantly, the xc keyword is required to specify the exchange-correlation functional; FHI-aims will not proceed without this information. The further “physical” specifications – spin, relativistic, and charge – are all at their default settings (no spin-polarization, no relativity, and no charge), but are listed explicitly to make them visible at a quick glance. This is especially important for the relativistic keyword, where the none setting would not be justified for heavier elements (see Sec. 4.2).

The next setting, relax_geometry, specifies a geometry relaxation using the bfgs algorithm, together with a standard convergence criterion for the forces in the final geometry: No force component for any atom of the relaxed structure should exceed 10-2 eV/Å. This criterion may well be set tighter for sensitive cases, such as a starting geometry to obtain vibrational frequencies, but not orders of magnitude tighter (for example, do not simply use a setting of 10-4 eV/Å because it feels more accurate – it will only end up probing some irrelevant numerical traits of the energy surface, for example from the finite integration grids, with no noticeable geometry or total energy changes resulting at all).

The version of the BFGS optimization algorithm used here is a trust-radius enhanced version (as compared to a straight, textbook-like BFGS implementation which could alternatively be used, see the description of the relax_geometry keyword). By default, the convergence of the relaxation is additionally sped up by an intelligent guess for the Hessian matrix used in the initial BFGS step. This is done by way of a slightly modified version of the general purpose model matrix proposed by Lindh and coworkers [199], see keyword init_hess.

The control.in could also include other general keywords concern the technicalities of obtaining self-consistency, which are not mandatory and are therefore not included in this simple test case. Examples include a broadening of occupation numbers around the Fermi level using the occupation_type keyword (this has no physical impact in a molecule with a HOMO-LUMO gap but may be important in metals), a mixing parameter charge_mix_param for the mixer (nonlinear optimization of the s.c.f. cycle), and convergence criteria for the s.c.f. cycle. FHI-aims attempts to choose reasonable settings for these aspects automatically in order to help avoid lengthy mistakes. The s.c.f. criterion for density convergence, sc_accuracy_rho in particular is set tightly by default, in order to have forces that are already mostly converged when the (more expensive) force computation is first done.

In control.in, it remains to set the species information for H and O, the elements included in control.in. Normally, these settings should be obtained by copy-pasting the relevant information from the species_defaults directory, for example the choices in the light or tight subdirectory located there. Once this is done, the species defaults may still be adapted for the purpose in question.

Output stream

We next analyze some significant parts of the standard output produced by FHI-aims, also provided in the file H2O.reference.out. We emphasize that this output is kept as human-readable as possible; it pays to actually look into the output, especially when something does not appear to have gone correct. Often, a simple warning in in the initial input section or elsewhere in the file may already tell you what is going on. Warnings can also be identified by “grepping” for asterisks in the file.

The standard output stream is structured as follows:

  • A summary of the setup (nodes used, required fixed dimensions, information in control.in and geometry.in) and, importantly, default values inserted for parameters that were not explicitly specified in control.in.

  • Preparation of fixed parts of the calculation – most importantly, information regarding the setup of the per-species basis

  • Initialization – information on the setup of all three-dimensional integrations, and solution of the first Kohn-Sham eigenvalue problem for the initial superposition-of-free-atoms electron density.

  • Process and total energy information for each successive s.c.f. iteration.

  • Upon convergence of the s.c.f. cycle, the Kohn-Sham eigenvalues are also included, as well as final total energies and forces in a long format for reading by external utilities (scripts etc.)

  • This is followed by information on the geometry optimization, up to the coordinates predicted for the next step, based on the converged forces obtained from the previous iteration.

  • Reinitialization, s.c.f., and geometry optimization information is repeated for each successive geometry step until the optimum geometry is found within the force tolerance specified by relax_geometry.

Key information that should be paid attention to is the following:

Eigenvalue and total energy information:
This should be checked for roughly realistic total energy values, a reasonable eigenvalue spectrum: Make sure that the expected core states, the correct number of eigenstates, and possibly the expected HOMO-LUMO gap are all in place.

As in all other iterations, three variants of the total energy are given (here quoted from the initialization step, just prior to the start of the s.c.f. cycle for the initial geometry):

  | Total energy                  :         -76.44577378 Ha       -2080.19534383 eV
  | Total energy, T -> 0          :         -76.44577378 Ha       -2080.19534383 eV
  | Electronic free energy        :         -76.44577378 Ha       -2080.19534383 eV

For systems with a HOMO-LUMO gap, these values should all be the same, but for systems with fractional occupation numbers (metallic systems, large occupation_type setting, or degenerate levels at the Fermi level), this is not generally the case.

For molecular systems, the meaning of fractional occupation numbers is questionable, and we recommend to always rely on the first value, the “Total energy”.

For metallic systems, fractional occupation numbers are unavoidable. However, it is possible to estimate the total energy for zero occupation broadening (“Total energy, T > 0”) based on an entropy expression associated with the fractional occupation numbers. This estimate is based on a free electron gas argument; do not trust it for atoms or small molecules. Bulk metals or metallic surfaces benefit from this correction, and their total energy may be estimated from the extrapolated total energy (“Total energy, T > 0”).

The “free energy” simply sums the total energy together with the full entropy correction from the fractional occupation numbers. This quantity has no physical meaning, but it is strictly this free energy to which our calculated forces correspond. The reason is that fractional occupation numbers may carry an implicit derivative of their own with respect to the nuclear coordinates. This would enter the total energy, but this derivative is not available in simple analytical terms in FHI-aims.

Geometry information:
After each relaxation step, the convergence of the present geometry is checked (by monitoring the maximum remaining force component on any atom in the structure), and the updated geometry to be treated next is printed. For instance, the final (converged) geometry can be found near the end, in a format that is directly suitable for a follow-up run:

------------------------------------------------------------
  Final atomic structure:
                         x [A]             y [A]             z [A]
            atom         0.00000000       -0.07327020       -0.00000000  O
            atom         0.76741277       -0.67036490        0.00000000  H
            atom        -0.76741277       -0.67036490        0.00000000  H
------------------------------------------------------------

By default, a version of this information is also written into a file geometry.in.next_step, which should be used (together with the hessian.aims file) to continue a relaxation run and also to begin a possible post-relaxation with tight settings (the hessian.aims file contains a usually much better Hessian for the structure at hand than the starting guess).

Timing information:
Finally, we note that FHI-aims also provides detailed timing information as well as partial memory accounting for each s.c.f. iteration, and as a summary at the end of each run. For example, the provided test run reads similar to this:

------------------------------------------------------------
          Leaving FHI-aims.
          Date     :  20200114, Time     :  104524.650

          Computational steps:
          | Number of self-consistency cycles          :           49
          | Number of SCF (re)initializations          :            5
          | Number of relaxation steps                 :            4

          Detailed time accounting                     :  max(cpu_time)    wall_clock(cpu1)
          | Total time                                 :        3.228 s           3.255 s
          | Preparation time                           :        0.067 s           0.067 s
          | Boundary condition initalization           :        0.000 s           0.000 s
          | Grid partitioning                          :        0.041 s           0.041 s
          | Preloading free-atom quantities on grid    :        0.028 s           0.028 s
          | Free-atom superposition energy             :        0.047 s           0.048 s
          | Total time for integrations                :        1.394 s           1.404 s
          | Total time for solution of K.-S. equations :        0.032 s           0.030 s
          | Total time for EV reorthonormalization     :        0.001 s           0.001 s
          | Total time for density & force components  :        0.986 s           0.991 s
          | Total time for mixing                      :        0.063 s           0.068 s
          | Total time for Hartree multipole update    :        0.050 s           0.049 s
          | Total time for Hartree multipole sum       :        0.400 s           0.402 s
          | Total time for total energy evaluation     :        0.008 s           0.014 s

          [...]

          Have a nice day.
------------------------------------------------------------

The date and time at the end are in the ddmmyyyy and hhmmss.mmm formats of a wall-clock time, not in seconds; i.e., the above calculation did not take 104524 s, but rather ended at 10:45:24 h, one fine January 14.

In addition, detailed timing is provided both for as elapsed CPU time (on individual CPUs during the run), and actual elapsed wall clock time. In a normal production run, no significant discrepancies should occur between wall clock and CPU times. If discrepancies arise, they could indicate serious problems with load balancing or communication in a parallel run.

Further analysis

Refer to caption
Figure 4.1: Total energy convergence (left y axis) and force convergence (right y axis) during the relaxation test run of H2O. A total of five steps is taken (initial geometry plus four relaxation steps). The total energy of the final step is taken as the reference here, and the total energy in the second-to-last step is already identical within the numerical accuracy of the calculation. The final step is only taken to bring the residual forces (total energy gradients) to practically zero as well.

Some statistics on the complete relaxation run can be obtained using the script
get_relaxation_info.pl, located in the utilities subdirectory of the FHI-aims distribution. This script is invoked as follows:

  > get_relaxation_info.pl < H2O.reference.out > statistics.dat

or with any other output file. The script searches the FHI-aims output file for total energies (no entropy correction) and maximum force components at the end of each relaxation step. The development of the total energy, total energy difference with respect to the starting geometry, and maximum force component can then be visualized as a function of the relaxation step number, using standard tools such as xmgrace. Fig. 4.1 visualizes the progress of the relaxation run based on the data obtained from get_relaxation_info.pl.

Likewise, much other information can and should be extracted from the standard output using similar scripts. For example, the development of the geometry can be visualized in the format of a .xyz file using the create_xyz_movie.pl script, using standard visualization tools such as jmol or vmd.

To continue a relaxation run (for instance, for a post-relaxation step with tight settings, see below), use the files geometry.in.next_step and hessian.aims that are created by default during a relaxation. They contain the atomic structure of the current relaxation step and the current estimate of the Hessian matrix, respectively, as created by the bfgs relaxation algorithm. This output can be fine-tuned (or prevented) using the write_restart_geometry and hessian_to_restart_geometry keywords.

Obviously, all intermediate geometries (but not the Hessian) are also part of the standard output stream by default, in the right format for a restart.

Next steps

As with any electronic structure code, monitoring the convergence of the basis set for each element is an important user task, to make sure any physical conclusions are accurate. For instance, the present example could be continued as follows:

  • A “prerelaxation” such as the present test run should not employ a huge basis set, simply for efficiency’s sake. The light settings used for the present elements use a tier 1 basis set. Very often, geometries obtained with light are already rather close to converged.

  • For these same elements, you will find that the tight settings actually employ tier 2, which is significantly larger and very close to basis set convergence at least for DFT methods.

  • Obviously, one might extend this to tier 3 or higher for test purposes, possibly even based on really_tight settings otherwise. Comparing the changes made between light, tight, and really_tight settings, and different basis sets, is an interesting exercise. For the H2O molecule shown here, it should hopefully reveal that there is not much to be gained beyond what is prescribed as “tight”.

Finally, we note that “tier 1” does not necessarily mean the same level of convergence accuracy for all elements. For light elements, tier 2 may often be needed, and we set them by default in our tight settings. For significantly heavier elements (transition metals in particular), tier 1 is already well converged for ground-state DFT calculations, which is therefore mostly the default in our tight settings.