E.5 TREXIO - An interoperable wave function file format
Purpose
TREXIO [253] is a powerful file format for storing quantum chemistry wave functions. It is developed as part of the TREX (Targeting Real chemical accuracy at the EXascale) center of excellence funded by the European Union. The general idea is to store the wave function in an interoperable format so that it can be treated and refined with different codes developed by respective specialists instead of having monolithic codes trying to perform each step of the way themselves.
An example workflow in this mindset can be that a Hartree-Fock calculation is performed with FHI-aims and the wave function is exported into a TREXIO file. This file can then be read in by a Configuration Interaction (CI) code such as Quantum Package (equally by the TREX-CoE [103]). Through a flavour of CI, a multideterminant wave function is generated which could then be used as a trial wave function in a Diffusion Monte Carlo (DMC) calculation with QMC=Chem (by the TREX-CoE as well [271]). Such a workflow allows to use NAO basis sets by FHI-aims with many different codes without making any changes to the individual codes or writing separate interfaces between them. A growing number of codes provides interoperability with TREXIO, e.g. PySCF [289] or iPie [207].
TREXIO supports different back ends for the wave function file. The simplest of these is
the text format, though it is not recommended in production use since the data is not protected
and I/O operations are much slower. The standard option is HDF5, which will be automatically
used if the library is linked against HDF5 at compile time. This format stores
the data in a single, binary file for faster and easier manipulation.
Both formats offer the same functionality. The back end can be set explicitly
using the trexio_export back_end keyword in control.in.
Please note: If you export integral values, the default accuracy of the auxiliary basis used in the resolution of identity is not sufficient. It is strongly recommended that the settings for the product basis are as tight as possible. See the example below for possible values and how to verify the settings are sufficiently tight.
TREXIO is open source and users are encouraged to expand its functionality to match their own needs. If you would like to add fields for your own data to the file format definition, please refer to the information for developers in the TREXIO documentation and GitHub repository.
Setup
To compile FHI-aims with TREXIO support, the flag
USE_TREXIO ON
must be set in the initial CMake cache. This works out of the box using
the version of TREXIO currently provided with FHI-aims. By default,
only the text output format is supported. To link against HDF5, the CMake
variable HDF5_ROOT must be set to the
root directory of your HDF5 installation in the initial cache.
Alternatively, it is loaded from the environment variable of the same
name. Compilation with HDF5 support is strongly recommended if available.
In many cases, it is advantageous to install TREXIO as a shared library on
your system. In this case,
you can link against the existing binary by providing the variable
TREXIO_DIR in the initial CMake cache
or as an environment variable. This variable points to the root directory of
the TREXIO installation.
To see which of these options is used, please refer to the CMake output when you load the initial CMake cache.
This functionality is completely separate from other HDF5 functionality
within FHI-aims, which is controlled by USE_HDF5
instead.
Usage examples
As a simple example, a CCSD(T) calculation of the water molecule with an NAO basis set is performed in PySCF. For this calculation, three steps are necessary: First, the Hartree-Fock orbitals are calculated using FHI-aims and exported into a TREXIO file. Secondly, this file is converted to the FCIDUMP format, which is a simple text file storing only the data necessary for a post-Hartree-Fock calculation. Finally, this file is read in using PySCF to perform the CCSD(T) calculation.
To perform the transformation from TREXIO to FCIDUMP, the python package trexio-tools is used. It is a useful set of scripts to perform a range of simple operations on the stored wave function or to convert between different formats if the targeted program does explicitly read TREXIO files.
1 — TREXIO file generation
To run the example calculation, you can use the following geometry.in and control.in files.
#geometry.in atom 0.00000000 0.00000000 -0.00614048 O atom 0.76443318 0.00000000 0.58917024 H atom -0.76443318 0.00000000 0.58917024 H
#control.in xc hf calculate_all_eigenstates trexio_export basis trexio_export integrals all 1e-10 trexio_export filename trexfile default_prodbas_acc 1e-10 default_max_l_prodbas 40 default_max_n_prodbas 40 relativistic none
Where tight settings have been used for the product basis. Add a basis set specification to your liking; for testing, very tight tier 1 is recommended. To enable the integration of the large angular momenta in the product basis, it might be necessary to increase the density of the angular grid.
Run the calculation. The trexfile appears alongside the other output files. If the HDF5 backend is used, a single file is generated. For the text backend, the output is a directory incuding several text files.
2 — File conversion
To perform this step, please install the trexio-tools python package with a package manager of your choice (e.g. pip). You can also install it manually from the Github repository. The package features a number of useful scripts. Although it is a python package, it works as a command that can be used on the command line. For now, run the following in your calculation directory:
trexio convert-to -t fcidump -o h2o.fcidump trexfile
This converts your TREXIO file into an FCIDUMP, which is an older file format that stores the minimal information necessary for post-Hartree-Fock methods.
3 — CCSD(T) calculation with FHI-aims integrals
Install PySCF if it is not yet available on your machine. You can then run the following script either from a file or interactively:
#Python script to perform a CCSD(T) calculation from a FCIDUMP file
import pyscf
import pyscf.cc
import pyscf.tools.fcidump
# Load the FCIDUMP
mf = pyscf.tools.fcidump.to_scf("h2o.fcidump")
# Run Hartree-Fock with molecular orbitals to set
# up the scf object
hf_energy = mf.kernel()
# Initialize CCSD object
mcc = pyscf.cc.CCSD(mf)
# Run CCSD as starting point
ccsd_correction = mcc.kernel()[0]
# Calculate perturbative triples correction
perturbative_correction = mcc.ccsd_t()
print("SCF energy: ", 27.2113845*hf_energy, \
"\t(Compare to FHI-aims value!)"
print("Total CCSD(T) energy: ", hf_energy \
+ ccsd_correction + perturbative_correction)
The Hartree-Fock energy produced by this script can be compared to that given by FHI-aims to assess whether the product basis is sufficiently large for the desired application or whether the integral threshold should be lowered.