4.10 Script based parallel tempering (a.k.a. replica exchange)
A script based parallel tempering implementation is available. Part of the script is dependent on the particular batch-queueing system in use; with the distribution, we provide a solution that has been tested on linux machines with SGE batch-queueing system. Whereas the overall structure of the batch script would not change by changing the batch-queueing, few crucial lines might need intervention.
4.10.1 Usage
In order to run the parallel tempering the batch script “submit.rex” must be submitted to the queueing system. The batch script:
-
1.
creates a subdirectory “rex_??” for each replica,
-
2.
copies the files needed for the FHI-aims run and runs them
-
3.
manages the swaps between replicas.
-
4.
prints outputs
The files that have to be present in the working directory are:
control.in.basic
control.in.rex
geometry.in.basic
optional: list_of_geometries
rex.AIMS.pl
submit.rex
The last two files are provided with the distribution and are contained in the subdirectory utilities/REX.
-
•
control.in.rex, it must contain the following lines:
n_rex number of replicas
temps list of target T separated by a space; the number of T’s must agree with the above line
freq time interval between rex swaps, in ps, as in control.in
MAX_steps maximum number of replica exchange steps (i.e., the whole simulation will contain MAX_steps*freq ps per replica) -
•
control.in.basic, as in FHI-aims. Note, though, that the script will delete any keywords about geometry relaxation and MD, with the exception of MD_time_step, and appends at the end of each control.in in each subdirectory the MD_settings for the replica exchange. In detail, the following are the lines which are managed by the script:
MD_run $t NVT_parrinello $temp[$i+1] 0.1
MD_MB_init $temp[$i+1]
MD_restart .true.
MD_clean_rotations .true.
output_level MD_light
where $t is a multiple of the “freq” keywords in control.in.rex, updated at each MD substep between swaps, and $temp[$i+1] is the target temperature for the particular replica and parallel tempering step. These lines are hard coded in the perl script rex.AIMS.pl. -
•
geometry.in.basic, written in the geometry.in format. It will be copied into each subdirectory, so that each replica would start form the same geometry.
-
•
optional: list_of_geometries If present, it must contain a list of geometry files (each in the geometry.in format), one line each, that must be present in the working directory. The script will copy the file in the first line into the first subdirectory (i.e. related to the first temperature in control.in.rex), and so on. In case list_of_geometries contains less lines than the defined number of replicas, the “exceeding” replicas will start with the geometry contained in geometry.in.basic.
-
•
rex.AIMS.pl, managing perl script. Nothing to be done here, in principle. If invoked as
perl rex.AIMS.pl stat <log_file>
in a directory that contains a log_file created by rex.AIMS.pl itself (see next section), it provides useful statistics (even on the fly). -
•
submit.rex is the batch script. Some attention form the user is required here, too.
-
–
select the total number of slots with the keyword "# $ -pe impi", according to the number of replicas. For performance reasons only, it is a good idea to have the number of slots be a multiple of the number of replicas (n_rex in geometry.in.rex). Informations and warnings concerning this issue will be written to log_rex.
-
–
give the variable type the value ‘init’ or ‘restart’, according to the kind of run. Note that by running a ‘restart’, the script will complete the possibly interrupted parallel tempering steps (also only in some of the subdirectories) and then will continue with the replica exchange algorithm.
-
–
set the proper name and path for the aims binary
-
–
set the number of slots per node (host) with ncpupn=<#SlotsPerNode>. This is particularly important for the right distribution of available slots. For performance reasons only, it is a good idea to have the number of slots per replica be a multiple of the number of slots per node (ncpupn) or vice versa. Informations and warnings concerning this issue will be written to log_rex.
Below, the relevant area for the settings is reported:
################### to be taken care of by the user ###############
binary=’<binary path and name>’
# put type=’init’, if initializing, ’restart’ if restarting
type=’init’
# type=’restart’
# number of CPU per node (host)
ncpupn=<#SlotsPerNode>
################################################################### -
–
-
•
run_rex.sh is a bash script in order to run locally
-
–
serves as a substitution for submit.rex if the SGE is not available
-
–
if possible, use submit.rex because of performance reasons due to the more sophisticated distribution of jobs over the available slots (CPU)
-
–
4.10.2 Output
-
•
in each of the subdirectories rex_?? there are the files:
-
–
temp.out full FHI-aims output for the parallel tempering tempering step
-
–
control.in and control.in, the usual FHI-aims input files. They will change at each parallel tempering step, managed by the script.
-
–
energy.trajectory. Cumulative (i.e. appended after each attempted swap) energy trajectory for the replica.
-
–
out.xyz. Cumulative geometry trajectory, in xyz format.
-
–
-
•
in the working directory: log_rex. It contains useful information on the swapping process. Below there is a commented example for a four replicas run.
> Mon Apr 5 03:51:05 CEST 2010
The time at the attempeted swap
> Tt 100.0 200.0 150.0 250.0
The list of the running target temperatures, first place for rex_00, and so on
> map 1 3 2 4
Map of the temperatures in the “Tt” line, into the original list given in control.in.rex
> TE -6963471.3877 -6963471.2516 -6963471.4951 -6963471.3286
Total Energy (“Total energy (el.+nuc.)”) in each replica (first item in rex_00 and so on)
> swapping 3 1 @T 150.0 100.0 accepted
> swapping 4 2 @T 250.0 200.0 rejected
Detail of attempeted swaps, with outcome
> temp 150.0 200.0 100.0 250.0
List of running target temperatures, after swaps.
> vfact 1.04880884817015 1 0.9534625892455937218 1
Rescaling coefficients for the velocities in each replica, for the next step
> ####### End of rex step #################WARNING: when wall-clock ends in the middle of a prallel tempering step, it will always be printed the message:
WARNING: rex_??/temp.out Not converged?
Please check this problem before continuing.
If the reason that any of temp.out’s does not reach not the end of the parallel tempering step is the end of the wall-clock time, then the run can be safely restarted by putting ‘type=restart‘ in submit.rex -
•
It is also possible to restart (prolong) a job that has been completed successfully, i.e. after the desired number of Replica Exchange steps has been performed. In order to do so, set ‘type=restart‘ in submit.rex, set ‘MAX_steps <#MaxSteps>‘ in control.in.rex according to the (new) desired maximum number of steps, and replace the third number in rex_par with that same number ‘<#MaxSteps>.
-
•
in the working directory: out.????, where ???? is a temperature, in 4 digits. Constructed by appending the temp.out temporary outputs at the same temperature, each out.???? contains the full FHI-aims output at the given temperature.