Development of a molecule-molecule pair potential

Dear all,

I’m trying to develop LAMMPS interface to my new machine learning interatomic potential. The potential is pairwise in nature between molecules. Hence to calculate energy (and forces on atoms) I need to know positions of all atoms in both molecules at an instant (say 4 atoms for H2-H2 pair). The atoms in a molecule are permanently bonded (bond_style) and rigid (fix shake or fix rigid).

Do I understand correctly that bonded atoms are removed from the nearest neighbour list?
If that’s the case is there any way to obtain this information? So, for the i-j loop over all local atoms and their neighbours I need to know what are the positions of the atoms bonded to i and j.

I tried accessing neighbor->bondlist but it contains only local bonds/atoms (I need to know how ghost atoms are bonded as well) apart from that the indexing of atoms does not correspond to neighbour list indices.

That depends primarily on the “special bonds” settings which can be adjusted with the special_bonds command — LAMMPS documentation
The default setting is lj/coul 0.0 0.0 0.0, i.e. all 1-2, 1-3, and 1-4 pairs in the bond-topology are ignored in the pair-wise neighbor list. And unless you are running in combination of with a kspace style, those pairs are indeed removed. If the values are set to 1.0 instead, the pairs are included. If the values are neither (exactly) 0.0 nor (exactly) 1.0, the pairs are included, but for every “j” atom of an i,j pair in the neighbor list of atom “i” the uppermost 3 bits are used to encode whether it is an 1-2, 1-3, or 1-4 pair. You can see how this this is applied, e.g. in pair style lj/cut when you look at the use of the sbmask() inline function and the NEIGHMASK constant.

The reason for this is that this massively simplifies the computation of the bond (and angle, dihedral) force constants, as they can attributed in an additive fashion to just the interaction of the corresponding bonded pair or triple, or quad of atoms. If the pairwise interactions are not removed, then the computation of the, e.g. force constant of the bond (usually derived from a harmonic analysis of vibrational spectra) would have to have the contribution of the non-bonded interaction between the pair of atoms removed to avoid double counting. If your potential does not account for that, you will have a serious design flaw (or a massive parameterization problem).

That is not correct. The indexing is by local index (and not atom ID) in both the neigbor list and the bond list. However, the bond list is not the correct place to look for all bonded neighbors of atom “i” in a neighbor list i,j-pair, since it will by default list each bond only once (corresponding to the newton command — LAMMPS documentation) and also the neighbor list will usually not store neighbors of ghost atoms, unless explicitly flagged in the neighbor list request (and the presence of those is subject to the communication cutoff).

Besides, this is not the proper way to access the bonded neighbors or atom “i”. Instead you should loop over the contents of the atom->special[i] list. This contains the atom IDs of the neighboring atoms, first all 1-2 pair, then all 1-3 pairs and finally all 1-4 pairs. Those offsets can be looked up in atom->nspecial[i]. To convert those back to local indexing, you will have to use an atom map and then you can use atom->map() to get the local index. But this is not where the complications end, you may also have to consider that the same atom ID may be present multiple times in the same subdomain (e.g. when running in serial with periodic boundaries), then you have to process the atom->sametag[i] value which will point you to the next local index with the same atom ID.

Thank you very much for your comprehensive answer. This clarifies a lot.

One more thought. If I read your description correctly, you are essentially trying to parameterize some kind of a coarse grain model where each molecule is represented by its center of mass and its orientation. Then you may not need to use “bonds” at all, but could implement your model as an nparticle “body” style. Then the constituent atoms would be part of the “body” and your pair style would compute center of mass force and torque between multiple bodies. The molecular geometry would be primarily required to define the moments of inertia and the center of mass and to reconstruct the atomic configurations in a trajectory.

Yes, you are right. The model can be implemented as nparticle style. The reason I decided to go “bond-style” route is to account for the future models. The current model is for N2 which is well described by the rigid particle. The future models (H2 perhaps) will require to consider bond (and perhaps even bond breaking at high pressure).

I tried your suggestion to access atom i bond with atom->special[i] and atom->map() and it works well. Interestingly for some molecules the bond length is incorrect and periodic boundary condition must be reapplied. I don’t quite understand why this is the case.

I also assumed that the same idea holds for j-atoms but this does not seem to be correct. The atom->map() returns -1. Is there any way to obtain j-atom bonded neighbour position?

Sorry, you are losing me here. If you want the kind of flexibility you are describing, you should just consider all neighbors and not use any explicit bonds. There are several approaches already available in LAMMPS and a bunch more as external packages or about to be added. I suggest you read the corresponding of the ones available. There are differences in how much “Chemistry” the various approaches add to the model and how much computational effort goes into them.
I am failing to see what you can gain with your approach unless you are really using it to reduce the number of sites to consider and thus just have parameters “per molecule” (there is a similar empirical potential for water, mW which uses the Stillinger-Weber manybody potential function, but has only one site per water, but can represent properties that empirical 3-site or 4-site explicit potentials cannot do). Having explicit bonds, possibly with empirical potentials would only make things slower and more complex since you have to avoid double counting of interactions.

atom->map() will give you the local index of one atom with the given tag, but there may be multiple. You should not need to apply periodic boundaries, but rather look at the m = atom->sametag[i] atom and then the next m = atom->sametag[m] atom unless it returns -1. Or save yourself some effort and use the domain->closest_image() function.

Most of the per-atom data is only available for “local” atoms, i.e. atoms with an index < atom->nlocal. The neighbor atoms “j” may be “ghost” atoms, i.e. copies of atoms from a neighboring subdomain.
Those are supposed to return -1 with atom->map() unless there is a periodic image of them somewhere.

I feel I should provide some more background on what I’m trying to achieve. I have developed a machine learning library for training of interatomic potentials. So far I have written an interface to LAMMPS which supports two-body and many-body type potentials (I modified EAM style). The idea here is to have one interface which supports many different types of descriptors/fingerprints and is also independent of a regression method (kernels, neuron nets,…). In other words the computation of forces and energies is done by the ML-library which is linked to LAMMPS. LAMMPS does the rest.

So now I’m toying with an idea of adding another “type” of potential which is a pairwise potential between dimers. Ideally there would be one interface to cover all three cases:

  1. Rigid molecules
  2. Permanently bonded molecules (e.g. harmonic)
  3. No permanent bonds

Ok, I see now that this is the way to go if I want to cover all 3 cases with one interface. I was hoping that case 1 and 2 can be achieved with permanent “bonds” (and save me some work) and case 3 would require different approach (all neighbours and no explicit bonds). Alternatively case 1 can be done with coarse-grained model as you suggested.

Do I read it correctly that there is no way to get j-atom bonded neighbour from the atom->special array? In this case is there any alternative approach (with explicit bonds) to get this info? The only way I can think of for now is it to request nearest neighbours for ghost atoms and include bonded atoms in those list with special_bonds. But this defeats the purpose of using bonds in the first place.

First off a little reminder. LAMMPS is designed to handle large systems in parallel using domain decomposition. Thus topology information like bonds, angles, etc. are stored with individual atoms.
The bond list is created from that information. If you look at the comment in the src/atom.h file you see that there are class members of the Atom class that have this information.

  int **nspecial;      // 0,1,2 = cumulative # of 1-2,1-3,1-4 neighs
  tagint **special;    // IDs of 1-2,1-3,1-4 neighs of each atom
  int maxspecial;      // special[nlocal][maxspecial]

  int *num_bond;
  int **bond_type;
  tagint **bond_atom;

  int *num_angle;
  int **angle_type;
  tagint **angle_atom1, **angle_atom2, **angle_atom3;

  int *num_dihedral;
  int **dihedral_type;
  tagint **dihedral_atom1, **dihedral_atom2, **dihedral_atom3, **dihedral_atom4;

  int *num_improper;
  int **improper_type;
  tagint **improper_atom1, **improper_atom2, **improper_atom3, **improper_atom4;

So the special list gives you all atom IDs (aka tag) of atoms connected to an atom via bonded interactions. The num_bond, bond_type and bond_atom list give you the number of bonds stored with an atom and their bond partners and respective bond types. bond_atom will only list the other atom, since bonds are symmetric. For angles and beyond all atom IDs are stored as you need to know the order, too. There is one more quirk to remember, though. LAMMPS handles bonded interactions differently depending on the Force::newton_bond setting. If it is on (the default) each bonded interaction is only stored once, i.e. the bond between atoms 15 and 16 may be stored with either atom 15 or 16. In the bond list it will thus appear only on the sub-domain where the atom “owning” the bond is a local atom. If the setting is off, the bonded interaction is stored with every involved atom, i.e. bonds are stored twice, angles 3x, dihedrals 4x. So when you are traversing the special list, you need to look for bonds with both atoms of a special pair.

You may want to look at the code for compute fragment/atom to gain some more insight into how to traverse the special list.