Development of a molecule-molecule pair potential

Dear all,

I’m trying to develop LAMMPS interface to my new machine learning interatomic potential. The potential is pairwise in nature between molecules. Hence to calculate energy (and forces on atoms) I need to know positions of all atoms in both molecules at an instant (say 4 atoms for H2-H2 pair). The atoms in a molecule are permanently bonded (bond_style) and rigid (fix shake or fix rigid).

Do I understand correctly that bonded atoms are removed from the nearest neighbour list?
If that’s the case is there any way to obtain this information? So, for the i-j loop over all local atoms and their neighbours I need to know what are the positions of the atoms bonded to i and j.

I tried accessing neighbor->bondlist but it contains only local bonds/atoms (I need to know how ghost atoms are bonded as well) apart from that the indexing of atoms does not correspond to neighbour list indices.

That depends primarily on the “special bonds” settings which can be adjusted with the special_bonds command — LAMMPS documentation
The default setting is lj/coul 0.0 0.0 0.0, i.e. all 1-2, 1-3, and 1-4 pairs in the bond-topology are ignored in the pair-wise neighbor list. And unless you are running in combination of with a kspace style, those pairs are indeed removed. If the values are set to 1.0 instead, the pairs are included. If the values are neither (exactly) 0.0 nor (exactly) 1.0, the pairs are included, but for every “j” atom of an i,j pair in the neighbor list of atom “i” the uppermost 3 bits are used to encode whether it is an 1-2, 1-3, or 1-4 pair. You can see how this this is applied, e.g. in pair style lj/cut when you look at the use of the sbmask() inline function and the NEIGHMASK constant.

The reason for this is that this massively simplifies the computation of the bond (and angle, dihedral) force constants, as they can attributed in an additive fashion to just the interaction of the corresponding bonded pair or triple, or quad of atoms. If the pairwise interactions are not removed, then the computation of the, e.g. force constant of the bond (usually derived from a harmonic analysis of vibrational spectra) would have to have the contribution of the non-bonded interaction between the pair of atoms removed to avoid double counting. If your potential does not account for that, you will have a serious design flaw (or a massive parameterization problem).

That is not correct. The indexing is by local index (and not atom ID) in both the neigbor list and the bond list. However, the bond list is not the correct place to look for all bonded neighbors of atom “i” in a neighbor list i,j-pair, since it will by default list each bond only once (corresponding to the newton command — LAMMPS documentation) and also the neighbor list will usually not store neighbors of ghost atoms, unless explicitly flagged in the neighbor list request (and the presence of those is subject to the communication cutoff).

Besides, this is not the proper way to access the bonded neighbors or atom “i”. Instead you should loop over the contents of the atom->special[i] list. This contains the atom IDs of the neighboring atoms, first all 1-2 pair, then all 1-3 pairs and finally all 1-4 pairs. Those offsets can be looked up in atom->nspecial[i]. To convert those back to local indexing, you will have to use an atom map and then you can use atom->map() to get the local index. But this is not where the complications end, you may also have to consider that the same atom ID may be present multiple times in the same subdomain (e.g. when running in serial with periodic boundaries), then you have to process the atom->sametag[i] value which will point you to the next local index with the same atom ID.

Thank you very much for your comprehensive answer. This clarifies a lot.

One more thought. If I read your description correctly, you are essentially trying to parameterize some kind of a coarse grain model where each molecule is represented by its center of mass and its orientation. Then you may not need to use “bonds” at all, but could implement your model as an nparticle “body” style. Then the constituent atoms would be part of the “body” and your pair style would compute center of mass force and torque between multiple bodies. The molecular geometry would be primarily required to define the moments of inertia and the center of mass and to reconstruct the atomic configurations in a trajectory.

Yes, you are right. The model can be implemented as nparticle style. The reason I decided to go “bond-style” route is to account for the future models. The current model is for N2 which is well described by the rigid particle. The future models (H2 perhaps) will require to consider bond (and perhaps even bond breaking at high pressure).

I tried your suggestion to access atom i bond with atom->special[i] and atom->map() and it works well. Interestingly for some molecules the bond length is incorrect and periodic boundary condition must be reapplied. I don’t quite understand why this is the case.

I also assumed that the same idea holds for j-atoms but this does not seem to be correct. The atom->map() returns -1. Is there any way to obtain j-atom bonded neighbour position?

Sorry, you are losing me here. If you want the kind of flexibility you are describing, you should just consider all neighbors and not use any explicit bonds. There are several approaches already available in LAMMPS and a bunch more as external packages or about to be added. I suggest you read the corresponding of the ones available. There are differences in how much “Chemistry” the various approaches add to the model and how much computational effort goes into them.
I am failing to see what you can gain with your approach unless you are really using it to reduce the number of sites to consider and thus just have parameters “per molecule” (there is a similar empirical potential for water, mW which uses the Stillinger-Weber manybody potential function, but has only one site per water, but can represent properties that empirical 3-site or 4-site explicit potentials cannot do). Having explicit bonds, possibly with empirical potentials would only make things slower and more complex since you have to avoid double counting of interactions.

atom->map() will give you the local index of one atom with the given tag, but there may be multiple. You should not need to apply periodic boundaries, but rather look at the m = atom->sametag[i] atom and then the next m = atom->sametag[m] atom unless it returns -1. Or save yourself some effort and use the domain->closest_image() function.

Most of the per-atom data is only available for “local” atoms, i.e. atoms with an index < atom->nlocal. The neighbor atoms “j” may be “ghost” atoms, i.e. copies of atoms from a neighboring subdomain.
Those are supposed to return -1 with atom->map() unless there is a periodic image of them somewhere.

I feel I should provide some more background on what I’m trying to achieve. I have developed a machine learning library for training of interatomic potentials. So far I have written an interface to LAMMPS which supports two-body and many-body type potentials (I modified EAM style). The idea here is to have one interface which supports many different types of descriptors/fingerprints and is also independent of a regression method (kernels, neuron nets,…). In other words the computation of forces and energies is done by the ML-library which is linked to LAMMPS. LAMMPS does the rest.

So now I’m toying with an idea of adding another “type” of potential which is a pairwise potential between dimers. Ideally there would be one interface to cover all three cases:

  1. Rigid molecules
  2. Permanently bonded molecules (e.g. harmonic)
  3. No permanent bonds

Ok, I see now that this is the way to go if I want to cover all 3 cases with one interface. I was hoping that case 1 and 2 can be achieved with permanent “bonds” (and save me some work) and case 3 would require different approach (all neighbours and no explicit bonds). Alternatively case 1 can be done with coarse-grained model as you suggested.

Do I read it correctly that there is no way to get j-atom bonded neighbour from the atom->special array? In this case is there any alternative approach (with explicit bonds) to get this info? The only way I can think of for now is it to request nearest neighbours for ghost atoms and include bonded atoms in those list with special_bonds. But this defeats the purpose of using bonds in the first place.

First off a little reminder. LAMMPS is designed to handle large systems in parallel using domain decomposition. Thus topology information like bonds, angles, etc. are stored with individual atoms.
The bond list is created from that information. If you look at the comment in the src/atom.h file you see that there are class members of the Atom class that have this information.

  int **nspecial;      // 0,1,2 = cumulative # of 1-2,1-3,1-4 neighs
  tagint **special;    // IDs of 1-2,1-3,1-4 neighs of each atom
  int maxspecial;      // special[nlocal][maxspecial]

  int *num_bond;
  int **bond_type;
  tagint **bond_atom;

  int *num_angle;
  int **angle_type;
  tagint **angle_atom1, **angle_atom2, **angle_atom3;

  int *num_dihedral;
  int **dihedral_type;
  tagint **dihedral_atom1, **dihedral_atom2, **dihedral_atom3, **dihedral_atom4;

  int *num_improper;
  int **improper_type;
  tagint **improper_atom1, **improper_atom2, **improper_atom3, **improper_atom4;

So the special list gives you all atom IDs (aka tag) of atoms connected to an atom via bonded interactions. The num_bond, bond_type and bond_atom list give you the number of bonds stored with an atom and their bond partners and respective bond types. bond_atom will only list the other atom, since bonds are symmetric. For angles and beyond all atom IDs are stored as you need to know the order, too. There is one more quirk to remember, though. LAMMPS handles bonded interactions differently depending on the Force::newton_bond setting. If it is on (the default) each bonded interaction is only stored once, i.e. the bond between atoms 15 and 16 may be stored with either atom 15 or 16. In the bond list it will thus appear only on the sub-domain where the atom “owning” the bond is a local atom. If the setting is off, the bonded interaction is stored with every involved atom, i.e. bonds are stored twice, angles 3x, dihedrals 4x. So when you are traversing the special list, you need to look for bonds with both atoms of a special pair.

You may want to look at the code for compute fragment/atom to gain some more insight into how to traverse the special list.

I’m revisiting the old problem which I couldn’t solve at the first attempt. Perhaps I can do better this time with a little bit more help :wink:

Here is a quick summary.
The system is composed of rigid dimers (N-N), i.e. every atom has one bond.
I’m iterating over all i1-atoms in a box, followed by a loop over nearest neighbours of i1-atom. For each atom i1 I can identify bonded atom i2. That works fine. The issue I’m having is that my approach occasionally fails when trying to find j1 bonded atom. Here is the relevant part of the code I’m using:

for (jj = 0; jj < jnum; jj++) {
    int j1 = jlist[jj]; 
    j1 &= NEIGHMASK;
    int n = nspecial[j1][0]; // number of bonded atoms either 0 or 1
    if (n==0)  continue;    // this is ad hoc. why is it zero sometimes?
    j2 = special[j1][0];    // get global index of j2
    j2 = atom->map(j2); // convert global j2 to local j2 
    j2 = domain->closest_image(j1,j2); // find nearest image of j2 wrt j1
    // distance between j1 j2 should be approx equal to bond length.
    // compute energy and forces....

The main problem I find with this code is that it fails to find a bonded atom to j1 when local index j1 is equal to atom->nlocal. So, the index j1 goes up to nlocal which I find surprising as I thought it should be j1<nlocal.
This is the only case. I verified this by running long simulation where molecules just freely move and ignore each other (no forces).

Also, why nspecial[j1][0] is either zero or one given all atoms have one bond?

I do not find it surprising. You are iterating through the neighbors of the local atoms “i”, and the index of those ranges from 0 to atoms->nlocal + atoms->nghost.

How many ghost atoms you have depends on your geometry, communication cutoff, boundary conditions, and number of MPI processes.

I’m testing the code on a simple cubic system with the periodic boundary in xyz. The code is compiled as serial and am using the full neighbour lists. The index of j1 is never greater than atom->nlocal. I can see that it can be the case for the half neighbour list.

I also run some further checks with different geometries - various tetragonal boxes and different basis - the problem is exactly the same.

The code fails to return bonded atom j2 only when j1 is equal to atom->nlocal, all other cases works fine. In fact I also checked for the half neighbour list and I found exactly the same behaviour.

It is difficult to assess from remote what you actually have done and what your system is like. But what you claim is not correct. If it were, a lot of what LAMMPS does, would not work correctly.

Please consider this simple minimal system, run it, and look at the neighbor list summary at the bottom of the output.

atom_style bond
region box block 0 1 0 1 0 1
create_box 1 box bond/types 1 extra/bond/per/atom 1 extra/special/per/atom 1

create_atoms 1 single 0.5 0.5 0.45
create_atoms 1 single 0.5 0.5 0.55
create_bonds single/bond 1 1 2

pair_style zero 1.0 full
pair_coeff * *
mass * 1.0

bond_style zero
bond_coeff 1 0.1

run 1

You should get:

Nlocal:              2 ave           2 max           2 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:             52 ave          52 max          52 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:           24 ave          24 max          24 min
Histogram: 1 0 0 0 0 0 0 0 0 0

As you can see, there are 2 local and 52 ghost atom (i.e. due to the communication cutoff of 1.3 \sigma and a box length of 1.0 \sigma, the domain decomposition will create a 3x3x3 size system (2*27 = 2+52 = 54). Combined there are 24 neighbors, 12 for each atom, since there are 2 ghost atoms each within the neighbor list cutoff in +/- x-, y-, and z-direction.

Please note that due to the default setting of special_bonds lj/coul 0.0 0.0 0.0 the local copies of the two local atoms are not within the neighbor list, as documented for the special_bonds command — LAMMPS documentation

You have to change it to something like special_bonds lj/coul 1.0e-100 1.0e-100 1.0e-100 to have those included, as can be seen from the number of full neighbors increasing by 2 to 26 (each of the two atoms in the bond become an additional neighbor for the other atom).