Long execution time for the optimizer training step

Dear all,
I have an MD simulation from which I extracted 50000 steps; the system is made by 27 Zr atoms and it’s a 3x3x3 supercell of the primitive Zr unit cell with one atom only. I am using the script 2_construct_fcp.py as described in
At the moment, I am interested only to the second order, so I set
cutoffs = [4.5]
for the cluster construction. I am running the script on a 8 core intel i7-4770 CPU @ 3.40GHz with 32GB RAM; I see that the training part runs on 8 cores, so I believe it is parallelized with openMP (is it lapack?). However, after 15h 30’ (wallclock time), it didn’t finish yet. To obtain the force constants with 10000 steps, it took instead 170’; by assuming a linear scaling (which apparently is wrong - how does it scale?) for 50k steps I expect that it should have been done in about 14 hours.
I then have the following questions:

  1. Do you have some hint to speed up the calculation significantly? Is there any option of the optimizer that can affect the maximum amount of RAM to use, hence the computation time?
  2. Considering that the symmetries have been correctly recognized, apart the cutoff, is there any parameter that can be set before calling the optimizer so that the train time is reduced?
  3. Which optimizer do you recommend in the case of a standard MD simulation? I read the manual, but I don’t see any hint on this. What quantities should I consider to choose the proper optimizer?

Thanks a lot in advance for your kind reply.

All the best

  1. If you have run 50,000 steps of MD, then I would suggest only using something like every 50:th structure. The MD snapshots are very correlated so it is not necessary to include all, and this will likely speed up the optimization greatly.
  2. The fit-method matters alot when it comes to performance/speed, for your problem I’d suggest using least-squares which is the fastest. But you are solving the linear problem Ax = f and the matrix A contains 50,000*27*3=4e6 rows which is very large. This is solved by np.linalg.lstsq which I belive is using lapack. Still 14hours sounds like a very long time, make sure you’re using the primitive Zr structure as input to the ClusterSpace. With the primitive structure for Zr and a cutoff of 4.5Å I would imagine you should end up with 10-50 parameters?
  3. If you have huge amount of training data (compared to the number of parameters) as you do then least-squares is a good choice. If you dont have a lot of data, or you have a lot of parameters then it could be a good idea to try out other fit methods.

Hi Eric,
thanks a lot for your quick reply.

  1. If you have run 50,000 steps of MD, then I would suggest only using something like every 50:th structure. The MD snapshots are very correlated so it is not necessary to include all, and this will likely speed up the optimization greatly.

I am not convinced about this. If I use one structure every 50, I cannot sample waves with a period of 50 fs, corresponding to a frequency of 1/50 * 1E15 s^-1 = 20 THz. In fact, for example, if I focus on one atom which displaces at that frequency, I see that atom still because every 50fs it will be in the same place. This happens despite the facts that the dynamics is carried out correctly thanks to the small time step used (1fs), the latter capable to sample the 20THz wave. Following your suggestion, to reduce the number of steps, I should then extract the trajectory every X steps in such a way that X satisfies the Nyquist’s theorem for at least the double of twice the highest harmonic frequency in the system: twice for the Nyquist theorem, and again twice that number to account for scattering effects which would double the harmonic frequency. What do you think? I might miss something here.
Concerning the correlation, why should it be an issue for the fit method? I still miss some knowledge in this matter. I imagine that the larger the training set, the better the fit, also considering that a fine sampling allows to better resolve the frequencies of the system.

  1. The fit-method matters alot when it comes to performance/speed, for your problem I’d suggest using least-squares which is the fastest…

The structure that I provide to the ClusterSpace is primitive. Thanks for suggesting the least-square, it saves me a lot of testing. I now wonder what is the input for the solver. I understand that, theoretically, the input are “F” and “a” as in the formulation in

which I provide with the xyz files. Then what does the structure container do? I see that there is a for loop

for structure in structures:

Does this loop accumulate the structures into some array in the memory, that is, does it load the full trajectory into the memory? What is exactly the form of sc.get_fit_data() which is passed as an argument to the optimizer? I believe it is not exactly “F” and “a” but some rearrangement of such elements; also, I believe that the full F and a set is reduced by considering the symmetry constraints.

  1. If you have huge amount of training data…

OK, thanks a lot for the suggestion!

The sampling frequency and nyqvist theorems are only relevant if you e.g. compute some correlation function in time or FFT etc. The time between two snapshots does not matter when it comes to fitting force-constants with hiphive and hence you dont need to worry about this.

You’re only using the displacements and forces for each snapshot and using these to fit an interatomic potential via Ax = F, where F is a N=n_snapshots_atoms*27*3 long vector containing all the force-components of the n_snapshots training structures. x is the unknown parameters of length M and A is “fit matrix” (which is built from the displacements of all snapshots) of shape (N, M).

It is not a problem but including many training structures does make the fitting much slower. Basically two snapshots separated by 1fs will likely contain the similar displacements and forces and thus it is not necessary to include both of these snapshots (but it does not hurt the accuracy of the model to include both).

The structure container basically collects and builds the full A matrix and the F vector.
For example you can run the following code

A, F = sc.get_fit_data()
print(A.shape, F.shape)
params = np.linalg.lstsq(A, F, rcond=-1)[0]
fcp = ForceConstantPotential(cs, params)

How many parameters do you get? The reason why I’m asking is that if you have a problem of about N=1e6, M=50 (which should be close to your linear problem) then I think least-squares should be able to solve it in minutes and not 15hours.

I see, you are right. Here the goal is to sample enough forces and corresponding displacements, not to sample frequencies, that is, we are not obtaining the frequencies of the system as a direct solution of the Ax=F problem, so the Nyquist theorem does not affect the results. At this point, for a good sampling, I would then extract randomly spaced (not too close) snapshots: according to the example that I made about the 20 THz wave, if I sample every 50 steps, then I would always get the same couple (force, displacements), obtaining a redundant training set. What do you think about this?

I see, so the matrix A is really huge, as it depends on the number of time steps, as well as the vector F. I thought that the problem Ax=F is solved in a similar way as in “a-TDEP: Temperature Dependent Effective Potential for Abinit”, where the sizes of the A matrix and the F vector do not depend on the number of the time steps.

Only 4:

=================== Cluster Space ====================
Spacegroup                 : Im-3m (229)
symprec                    : 1e-05
Sum rules                  : True
Length scale               : 0.1
Cutoffs                    :
==== Cutoffs =====
 body/order |  2  
     1      |  -  
     2      | 4.5 
Cell                       : Cell([[3.1004524490533205, 0.0, 0.0], [-1.0334842154926531, 2.9231345787449747, 0.0], [-1.0334842154926527, -1.461567428898554, 2.531508723241809]])
Basis                      : [[0.6335144  0.53913033 0.39467079]]
Numbers                    : [40]
Total number of orbits     : 3
Total number of clusters   : 8
Total number of parameters : 4
order | n_orbits | n_clusters
  2   |      3   |       8

I ran the 2_construct_fcp.py script step by step in an interactive python session and I realized that the bottleneck is the step

structures = prepare_structures(rattled_structures, atoms_ideal)

which takes hours! In this step, I guess that it calculates the displacements by subtracting the trajectory loaded in rattled_structures from the ideal supercell in atoms_ideal. I see that this part runs in parallel OpenMP and by default it uses all the cores available on the machine. Apart reducing the number of configurations in the trajectory, is there any option to speed up this part? Maybe the amount of memory to be used or else.

This sounds like a very good idea.

Yes all this function does is try to extract the forces and displacements for each supercell.
You dont need to use it if you can easily calculate the displacements yourself.

What I imagine takes time is

  1. The function will find which atom in the traj corresponds to which ideal lattice site in the ideal structure. Probably you dont need to do this step at all, and atmost you need to do it once, not 50 000 times.
  2. calculating the displacements considering PBC, however if your traj is not wrapped then you can get the displacements by just disp=atoms.positions-atoms_ideal.positions which is much faster

Yes, I can provide displacements and forces directly. I understand that these two quantities are stored in the StructureContainer(cs) through the for loop

for structure in structures:

If I have a file which already contains displacements and forces for each time step, can you tell me what is the syntax that I have to use to insert them into the sc container? In this way I can skip the step “structures = prepare_structures(rattled_structures, atoms_ideal)”.

The structure added to the SC should be the “ideal atoms object” (i.e. positions should not contain the displacements) and have the displacements and forces as “arrays”.

For example you could try a snippet like

for atoms in traj:
    disps = ...
    forces = ....
    atoms_tmp = atoms_ideal.copy()
    atoms_tmp.new_array('forces', forces)
    atoms_tmp.new_array('displacements', disps)

I am sorry, but I have very very basic knowledge of python and I don’t know how to modify the code that you suggests for my case. Let’s imagine that I have the displacements and forces stored in a text file like this:

some comment
disp_x disp_y disp_z force_x force_y force_z
disp_x disp_y disp_z force_x force_y force_z
some comment
disp_x disp_y disp_z force_x force_y force_z
disp_x disp_y disp_z force_x force_y force_z

where I have as many blocks as the number of snapshots extracted from the trajectory, and each block has the structure:
line 1: number of atoms
line 2: comment
line 3: cartesian displacements and forces

Now, my question is: how do I read and load these displacements and forces into the sc = StructureContainer(cs)? I can understand what your code does but I don’t know how to adapt it to this case.

Thanks again for your help.

When you were running the prepare_structure function you must have at some point read the trajectory into python? Can you not reuse this and simply skip the call to prepare_structures?

The file above can maybe be read with numpy.loadtxt using skiprows and max_rows? Or you can probably make a simple reader yourself by just parsing the lines manually (but this will require you to write some code yourself), e.g.

import numpy as np

start_line = 2
n_atoms = 27
lines_per_snapshot = 29
end_line = start_line + n_atoms

n_snapshots_to_read = 2
with open('data', 'r') as f:
    # read all lines
    lines = f.readlines()
    for it in range(n_snapshots_to_read):
        # read single snapshot
        print('Reading snapshot', it)
        lines_snapshot = lines[start_line:end_line]
        disp_force_data = []
        for line in lines_snapshot:
            data = [float(fld) for fld in line.replace('\n', '').split(' ')]
        disp_force_data = np.array(disp_force_data)
        print('  ', disp_force_data.shape)

        # move to next snapshot
        start_line += lines_per_snapshot
        end_line += lines_per_snapshot

Thanks a lot for the suggestion.
I create a file that contains displacements and forces at each time step and I read it with this code

sc = StructureContainer(cs)

with open('ndhiPhive_dispfor.xyz', 'r') as f:
    start_line = 1
    line = f.readline()
    n_snapshots_to_read = int(line.split()[0])
    n_atoms = int(line.split()[1])
    end_line = start_line + n_atoms
    print("n_atoms %i n_snapshots_to_read %i" % (n_atoms, n_snapshots_to_read))
    for it in range(n_snapshots_to_read):
        print('Reading step', it)
        lines_snapshot = f.readlines()[start_line:end_line]
        atoms_tmp = atoms_ideal.copy()
        disps = []
        forces = []
        for line in lines_snapshot:
            disp_tmp = [float(fld) for fld in line.split()[0:3]]
            force_tmp = [float(fld) for fld in line.split()[3:6]]
        atoms_tmp.new_array('displacements', disps, dtype=float)
        atoms_tmp.new_array('forces', forces, dtype=float)
        start_line += n_atoms
        end_line += n_atoms

it looks like that it’s much faster than using the call to the prepare_structures function. Any more hints on how to speed up the reading is appreciated.

Thanks again for your help


No hints, but probably it can be made quite alot faster, how to parse this type of data is not really hiphive related.
But I would recommend you separate “parsing of structures” and creating a SC, e.g. make a script that parse all displacements and forces and write them into a nice format e.g. ase-database or .extxyz file. Then in second script you can very quickly read all the data and construct a FCP.