Questions regarding extending lammps

Sebastian_Gsanger · October 19, 2017, 12:37pm

Hi,

i’m working on a pair style and have a few questions regarding the implementation:

What is the consensus regarding stl-containers?

I rarely see them used in other files and wondered if there’s a reason for this.
I see the point in using centralized memory-management for larger structures,
but they could be very useful for local temporaries.

What’s the stance towards c++11 (and beyond)?
Should I rather stick to 03-features if possible?

(Maybe someone could also add any possible limitations to the section in the manual?)

More in-depth now:
I’m not quite sure which information is accessible in each process, and how.

For each interaction between two atoms, i need the nearest neighbors of each of them.
Given i have to use full neighborlists, i can not guarantee that if i determine all neighbors for all local atoms i, the interacting atoms j have locally-determined nearest-neighbors, as they are possibly not local atoms.
So i have to synchronize this information between all processes, but as far as i understand there is no trivial mapping between indices of local atoms to nonlocal atoms on another process. Are there some helper-routines akin to mpi_allgather that take care of the mapping process?
Or will this introduce more overhead than just performing my (not overly complicated) calculations twice on ij and ji iterations?

Thank you very much in advance,
Sebastian

akohlmey · October 19, 2017, 1:25pm

first off, there will possibly some additional comments or suggestions
from other developers. what is written below is my personal view on
things. each developer has their own views and preferences. we mostly
agree, but not always and not on all issues. if a compromise cannot be
found, steve has the last word and gets to make the final decisions.

Hi,

i'm working on a pair style and have a few questions regarding the
implementation:

1. What is the consensus regarding stl-containers?
I rarely see them used in other files and wondered if there's a reason for
this.
I see the point in using centralized memory-management for larger
structures,
but they could be very useful for local temporaries.

a lot of the LAMMPS code goes back to times when STL could not be
trusted or is derived from such code.
it should be avoided to use them in class definitions. also, template
functions in general, make debugging code much more difficult, so for
most LAMMPS developers it is preferred to avoid them outside of cases
where they have a significant benefit over plain C++ code.

2. What's the stance towards c++11 (and beyond)?

KOKKOS is the only part of LAMMPS that requires it.

Should I rather stick to 03-features if possible?

yes. there are still machines "in the wild" running on red hat 5.x,
which means gcc 3.x is the default compiler.
the more advanced C++ features you use, the less portable your code
and the more "bug" reports there will be.
in the end, it is your choice how you write your code and the GPL
permits you to modify LAMMPS any which way to like and redistribute
this modified version yourself.
however, it might be, that your changes will not be accepted into the
LAMMPS distribution itself because of it.
regardless of whether you use more recent C++ features or not,
readability and cleanliness of the code is the most important
property.

(Maybe someone could also add any possible limitations to the section in the
manual?)

there are no fixed rules, just preferences of the developers.

3. More in-depth now:
I'm not quite sure which information is accessible in each process, and how.

check out the developers guide and read through the comment in the header files.

For each interaction between two atoms, i need the nearest neighbors of each
of them.
Given i have to use full neighborlists, i can not guarantee that if i
determine all neighbors for all local atoms i, the interacting atoms j have
locally-determined nearest-neighbors, as they are possibly not local atoms.

that statement is not correct. each processor will have "owned", i.e.
local atoms, and atoms from neighboring subdomains, i.e. ghost atoms.
the amount of ghost atoms is determined by the communication cutoff,
which is by default the largest pairwise cutoff plus the neighbor list
skin.
you can determine though flags in the neighbor list request whether
you want to have only neighbors for local atoms (that is the default,
and these may be local or ghost atoms) or include neighbors of ghost
atoms as well (for as long as they are within the communication
cutoff). this is used by advanced pair styles like airebo to access
neighbors of neighbors to compute interactions up to dihedrals
dynamically from the pairwise neighbor list (and custom sublists
derived from the neighbor lists).
in general, you do not even need all local neighbors. e.g. pair style
eam only uses a half neighbor list, even though it needs to compute
the contributions from all neighbors to the embedding energy and its
derivative. this is done by adding information into per-atom arrays
and then using a reverse communication to assign contributions tallied
to ghost atoms to their corresponding local atoms in a different
subdomain (and then use a forward communication to push out the
results to all ghost atoms again).

So i have to synchronize this information between all processes, but as far
as i understand there is no trivial mapping between indices of local atoms
to nonlocal atoms on another process. Are there some helper-routines akin to

not correct. atoms can be uniquely identified by their "tag" property
and the mapping of a tag to a local atom can be obtained through
Atom::map() (which will return -1 for atoms not present).
also, as mentioned above, information can be easily and efficiently
(i.e. in parallel) communicated for typical problems with the
forward/reverse communication pattern.

mpi_allgather that take care of the mapping process?
Or will this introduce more overhead than just performing my (not overly
complicated) calculations twice on ij and ji iterations?

collecting all information from all MPI ranks on each subdomain is a
bad idea and should be avoided at all cost. this undoes the domain
decomposition and will seriously limit parallel scaling. the
architecture of LAMMPS is such that it will only communicate
information to nearby atoms (up to the communication cutoff), which
leads to its excellent weak scaling properties. if information
absolutely needs to be passed around all processors, you can look into
the ring communication class, for example.

axel.