Problem with writing a new pair style

Daniele_Rapetti · August 28, 2018, 9:16am

Hello!
I have this issue:

I’m adding the TB-SMA potential ( https://doi.org/10.1103/PhysRevB.23.6265 ) to LAMMPS as a new pair style.

The code works and gives the same result as my old program until for some reason LAMMPS calculates the neighbours in a wrong way (some first neighbours disappear form the list of some atoms).

Moreover if I try to use my code with pbc some pointers, initialized with the create method from memory, lose their address (i controlled with gdb), and this happen not only in my code, but also in other pieces of LAMMPS.

I’m working on the commit tagged “stable_22Aug2018” from the GitHub repository.

I attach my pair style and some input
Best regards

pair_smatb.cpp (14.3 KB)

pair_smatb.h (2.45 KB)

FlatGoldwitPBC.in (1022 Bytes)

IsolatedAgCu_NP.in (3.34 KB)

akohlmey · August 28, 2018, 7:01pm

Hello!
I have this issue:

I’m adding the TB-SMA potential ( https://doi.org/10.1103/PhysRevB.23.6265 ) to LAMMPS as a new pair style.

The code works and gives the same result as my old program until for some reason LAMMPS calculates the neighbours in a wrong way (some first neighbours disappear form the list of some atoms).

can you explain a bit more what you mean by this?

just FYI, you cannot depend on atoms remaining in the same order, or neighbors in the same order, as atoms move from subdomain to subdomain and also get sorted while being binned for better cache efficiency. if you need to uniquely identify atoms, you have to look at atom->tag[i] or identify the matching local index of a given tag (aka atom ID) of that atom via atom->map().

Moreover if I try to use my code with pbc some pointers, initialized with the create method from memory, lose their address (i controlled with gdb), and this happen not only in my code, but also in other pieces of LAMMPS.

again, some more specific examples (e.g. what kind of pointers?) to underline what you are observing and an explanation, why this is a problem would be very helpful.

as atoms move around, blocks of memory, e.g. for coordinates, forces, or other per-atom properties need to grow, which sometimes means, buffers need to be reallocated and data copied. that is just normal. you must not depend on these pointers addresses being identical between invocations of the compute() method.

I’m working on the commit tagged “stable_22Aug2018” from the GitHub repository.

I attach my pair style and some input

thanks, but even before looking at code, you need to confirm more details of the issue and that you have understood the memory management and data access code patterns in LAMMPS well enough.

thanks in advance for any additional information you can provide the list with,
axel.

Daniele_Rapetti · August 29, 2018, 9:39am

Hello again, and thanks for the answer.

The TB-SMA potential does not need a particular order in calculating the energies and the forces. It is not a complex potential, the main difference in complexity from a LJ is that the bounding energy for each atom is a sum of terms under a square root, and this implies that the calculation of the forces needs to be preceded by a loop where the square root term is calculated.

With “The code works” I meant that when calculation goes well the style that I wrote gives the same correct outputs (energy and forces) as my old standalone code.

can you explain a bit more what you mean by this?

With the “bin” style for calculating neighbours sometimes the neighbours of some atom are not considered: by plotting the lists of neighbours (the one that is stored in “list->firstneigh[i]”) and the position of atoms (to manually confront the distances) I found that some neighbours were ignored causing the forces and the energy to be calculated wrongly.This error gets reproduced identical if I restart a calculation with the same input file.
With the “nsq” style I didn’t get this error.

again, some more specific examples (e.g. what kind of pointers?) to underline what you are observing and an explanation, why this is a problem would be very helpful.

I think I need to answer twice to this: in fact I get two different kind of errors.

At first the pointers that got corrupted(?) were the ones that should store the value of the parameters of the potential, created with the method “memory->create” and that are declared as "double** " in the header.
An example:

I made the class plot this table (in the “compute()” method) before starting each calculation:

“”"

nt &QSI[nt] QSI[nt] *(QSI[nt]+1)
0 0xf4f040 0xf4efc0 0
1 0xf4f048 0xf4efd0 1.818
“”"
few row later I get a SIGSEGV, this is the gdb output:

“”"
Program received signal SIGSEGV, Segmentation fault.
0x00000000009c3e40 in LAMMPS_NS::PairSMATB::compute (this=0xf4e330, eflag=1, vflag=2)
at /path/to/LAMMPS_stable/src/SMATB/pair_smatb.cpp:125
125 qsiexpq = (QSI[itype][jtype]QSI[itype][jtype]) * exp(2.0q[itype][jtype]*(1.0 - dij/r0[itype][jtype]));
(gdb) p QSI[itype][jtype]
Cannot access memory at address 0x3d0a4ea2c1eae560
(gdb) p QSI[itype]
$1 = (double *) 0x3d0a4ea2c1eae558
(gdb) p &QSI[itype]
$2 = (double **) 0xf4f048
(gdb) p itype
$3 = 1
(gdb) p jtype
$4 = 1
“”"
and this don’t happen on the first atom but after some loop cycles.

Then in an attempt of making the code work I created another style, the parameters now are “double” variables, making the potential working only for one type of atoms and I got errors from various pieces of LAMMPS. for example gdb says:
“”"
ERROR on proc 0: Non-numeric atom coords - simulation unstable (src/domain.cpp:518)
Last command: minimize 1.0e-4 1.0e-6 100 1000

akohlmey · August 29, 2018, 10:13am

Hello again, and thanks for the answer.

The TB-SMA potential does not need a particular order in calculating the energies and the forces. It is not a complex potential, the main difference in complexity from a LJ is that the bounding energy for each atom is a sum of terms under a square root, and this implies that the calculation of the forces needs to be preceded by a loop where the square root term is calculated.

With “The code works” I meant that when calculation goes well the style that I wrote gives the same correct outputs (energy and forces) as my old standalone code.

can you explain a bit more what you mean by this?

With the “bin” style for calculating neighbours sometimes the neighbours of some atom are not considered: by plotting the lists of neighbours (the one that is stored in “list->firstneigh[i]”) and the position of atoms (to manually confront the distances) I found that some neighbours were ignored causing the forces and the energy to be calculated wrongly.This error gets reproduced identical if I restart a calculation with the same input file.
With the “nsq” style I didn’t get this error.

again, some more specific examples (e.g. what kind of pointers?) to underline what you are observing and an explanation, why this is a problem would be very helpful.

I think I need to answer twice to this: in fact I get two different kind of errors.

[…]

“”"

I think that I understood the data access pattern, my style is based upon the lj_cut one, so the way to access to the atoms ids, positions and to their neighbours is done following the code of that pair style.

you probably should have looked at pair_eam.cpp instead, because the sequence of steps is the same.

also, there are some conceptual issues, that you need to address.
the neighbor list you receive is a “half” neighbor list, so it contains each pair of atoms only once. with force->newton_pair == 0 (the default), this also applies to pairs that straddle subdomain boundaries. so when you accumulate data in your “den” array, there has to be communication for data to retrieve information from “ghost” atoms to sum them into the den array for “local” atoms. this “den” array also must not be of the size “natoms”, but of the size “nlocal+nghost” or just “nlocal” for force->newton_pair == 1. in the latter case the communication is not needed, as those pairs are listed twice. you can see the same procedure happening in the eam pair style for the “rho” array. you will also see a call to the comm class to invoke a “reverse communication”, which does the data retrieval and summing. please also look at the pack_reverse() and unpack_reverse() calls which are invoked by the comm class to pack data that needs to be communicated into a buffer and then unpack it. since in the second step you need to have access to “den” for local atoms and ghost atoms, you also have to request a forward calculation (see the use of the “fp” array in pair style eam, which contains data derived from “rho” and required to compute the force contribution due to the embedding energy).
in general, you also have to understand the differences between nlocal, nghost, nall (= nlocal+nghost), and natoms. the neighbor lists are indexed in terms of nlocal/nall, but not in terms of natom (aka the atom ID). so in your case, where you have “den” in the dimension of natoms, you have to index it by atom->tag[i] or atom->tag[j], and then do a call to MPI_Allreduce() to do a sum of den[] across all MPI ranks for all atoms. then you can do the computation of the energy contribution, which must be done only for one MPI rank, since now that array is replicated across all ranks and has the contributions from all atoms. however, this procedure is getting increasingly inefficient as the number of MPI rank increases. thus the procedure as in pair style eam if preferred with the reverse and forward communication.

hope this makes it more clear where your problems originate and how you can address them.

axel.

Daniele_Rapetti · August 30, 2018, 10:56am

Thanks!

Following your advice I studied more the pair_eam potential, and now I made my TB-SMA work.