Calculating a global scalar and a peratom array in the same compute

Dear LAMMPS community!

I've been developing a compute that calculates a peratom array and
reuses some of the quantities to calculate a global scalar if requested
in the input script. To do so I implemented a compute_peratom() method
as well as a compute_scalar() method, where the latter calls the former
in order to have up-to-date data available.

However, recently I ran into trouble with this approach when using
neighbor lists within the compute (they contained indices that were
larger than nlocal+nghost when the compute_peratom method was called
from the compute_scalar method). So I have a couple of questions:

-) Firstly, this kind of use, combining a scalar and a peratom quantity
is not done in any of the computes that come with LAMMPS (at least I
couldn't find one), which leads me to believe, that this was not
actually the intended use. Should one strictly separate per-atom and
global computes?
-) Is there some sort of guarantee, that compute_peratom is called
before compute_scalar? It does work in the examples I have tried,
however, I am not sure this is true in general.
-) I use the neighborlist in pretty much the same way, it is used in
other computes that calculate peratom values. Is there a different way
to use them from the compute_scalar() method?

I can also provide a minimal example of the issue with the neighbor
lists mentioned above, if that is of any help.

Thank you for your help!

Best regards,
Clemens Moritz

Dear LAMMPS community!

I've been developing a compute that calculates a peratom array and
reuses some of the quantities to calculate a global scalar if requested
in the input script. To do so I implemented a compute_peratom() method
as well as a compute_scalar() method, where the latter calls the former
in order to have up-to-date data available.

However, recently I ran into trouble with this approach when using
neighbor lists within the compute (they contained indices that were
larger than nlocal+nghost when the compute_peratom method was called
from the compute_scalar method). So I have a couple of questions:

-) Firstly, this kind of use, combining a scalar and a peratom quantity
is not done in any of the computes that come with LAMMPS (at least I
couldn't find one), which leads me to believe, that this was not
actually the intended use. Should one strictly separate per-atom and
global computes?

no. what you do is a valid approach.

-) Is there some sort of guarantee, that compute_peratom is called
before compute_scalar? It does work in the examples I have tried,
however, I am not sure this is true in general.

if you explicitly call compute_peratom() from within compute_scalar()
that *is* your guarantee.
if needed, i would expand tests on invoked_scalar and invoked_peratom
to avoid having to do the computation multiple times on the same
timestep.

-) I use the neighborlist in pretty much the same way, it is used in
other computes that calculate peratom values. Is there a different way
to use them from the compute_scalar() method?

no.
how much larger are those indices? very much?
do you by any chance have a molecular system and thus some neighbors
may be flagged as special bonds?
if yes, did you apply NEIGHMASK properly to the j indices?

axel.

Thanks for the quick reply!

if you explicitly call compute_peratom() from within compute_scalar()
that  your guarantee.
if needed, i would expand tests on invoked_scalar and invoked_peratom
to avoid having to do the computation multiple times on the same
timestep.

Okay, very good. This is exactly what I do.

no.
how much larger are those indices? very much?
do you by any chance have a molecular system and thus some neighbors
may be flagged as special bonds?
if yes, did you apply NEIGHMASK properly to the j indices?

I’ve seen the indices being bigger than nlocal+nghost by as much as 50, but not any outlandish numbers.

I simulate a system of water molecules using fix rattle. The NEIGHMASK is applied as follows:

// invoke full neighbor list (will copy or build if necessary)
neighbor->build_one(list);

int inum = list->inum;
int* ilist = list->ilist;
int* numneigh = list->numneigh;
int** firstneigh = list->firstneigh;
int *mask = atom->mask;
int *tag = atom->tag;

for (int ii = 0; ii < inum; ++ii)
{
int i = ilist[ii];
if (mask[i] & groupbit)
{
int* jlist = firstneigh[i];
for(int jj = 0; jj < numneigh[i]; ++jj)
{
int j = jlist[jj];
j &= NEIGHMASK;
if(jj == 0) vector_atom[i] = j;
assert(j < (atom->nlocal + atom->nghost));
}
}
}

The actual problem occurs when I run two simulations from the same input file. The first simulation (10000 steps) runs smoothly, but the second one crashes immediately (by failing the assertion in the 4th to last line).

I am using version 16 Feb 2016 with the RIGID, MOLECULE, KSPACE and VORONOI packages. I’ve attached the minimum example, description is included.

Best regards,
Clemens Moritz

min_example.tgz (39.5 KB)

The issue with compute output (and with fixes I think) is

that a compute should only produce one of the following:

scalar, per-atom, local data. This is b/c the commands

that use compute output like variables, and some other
computes like compute reduce, do not have the ability

in their syntax to distinguish between them. E.g. a per-atom

compute accesses your compute with ID foo. How can it tell

whether to use the scalar or per-atom values if it

references c_foo?

I can only recall one exception to this, which is compute voronoi/atom

which can produce either per-atom or local data. Aidan convinced

me this was a good idea in this case. But we made sure

it was carefully documented, and that there was an option

for the user to turn off one part of the output if they wanted

to access the other part. See the

bottom of the compute vornoi/atom doc page.

Re: how computes are called. That is really up to you

and your input script. LAMMPS does not call any computes

on its own. They are only called on timesteps that your

input script requests they be called. Again, that means

that some other command wants the output from the compute.

The logic it uses to call the compute depends on the

scalar, peratom, local flags that the computes sets. If you

set multiple of them, then the caller may get confused as

to which portion of the compute to call. It may not call

multiple portions, b/c none of the callers expect multiple

portions to be enabled.

Steve

The issue with compute output (and with fixes I think) is
that a compute should only produce one of the following:
scalar, per-atom, local data. This is b/c the commands
that use compute output like variables, and some other
computes like compute reduce, do not have the ability
in their syntax to distinguish between them. E.g. a per-atom
compute accesses your compute with ID foo. How can it tell
whether to use the scalar or per-atom values if it
references c_foo?

​i think returning the compute_peratom() data in a per-atom context and the
compute_scalar() data in a scalar context is a reasonable behavior. and
this is what LAMMPS currently is doing. if i want to access the scalar data
in a per-atom context, i can easily enforce a scalar context by rerouting
access via an equal style variable. i've found that this works satisfactory
with several of the computes in USER-TALLY and it beats having to define a
costly compute twice or having to go through compute reduce plus another
equal style variable to process the per-atom data into a scalar.

there is at least one more compute that has both compute_scalar() and
compute_peratom() ( smd/vol ).

axel.

I can only recall one exception to this, which is compute voronoi/atom

Thank you both for your input!

I have run into this ambiguity of which kind of variable is used with my computes when doing histograms of the calculated values and I’ve avoided it exactly in the way Axel suggests.

However, if this pattern of packaging scalar and per-atom calculations into the same compute is valid, than something weird seems to be going on with the neighbour lists when they are called from a compute_scalar() method right after a simulation is started and I don’t think it is related to the NEIGHMASK.

Best,
Clemens

Thank you both for your input!

I have run into this ambiguity of which kind of variable is used with my
computes when doing histograms of the calculated values and I've avoided it
exactly in the way Axel suggests.

However, if this pattern of packaging scalar and per-atom calculations into
the same compute is valid, than something weird seems to be going on with
the neighbour lists when they are called from a compute_scalar() method
right after a simulation is started and I don't think it is related to the
NEIGHMASK.

i just compiled and ran your test code/input with the latest
development code and cannot reproduce the neighborlist issue you are
reporting.
the input happily continues after the first 10000 steps.

axel.

Okay,

now this is mysterious. I just pulled a vanilla copy of LAMMPS (latest
git checkout), added the compute and ran the simulation - and got the
error - on two different machines.

Did you use 4 processes as well? I did not get the error when using 16
processes...

I've attached my makefile and a log, maybe there is a clue in there?

Best,
Clemens

Makefile.openmpi_debug (2.88 KB)

output (22.4 KB)

Okay,

now this is mysterious. I just pulled a vanilla copy of LAMMPS (latest
git checkout), added the compute and ran the simulation - and got the
error - on two different machines.

Did you use 4 processes as well? I did not get the error when using 16
processes...

yes. but i also ran on a couple of other machines, and *did* get the crashes.

I've attached my makefile and a log, maybe there is a clue in there?

i've been digging around a bit and i think i have found a way to
handle this, but i am copying steve, because i don't fully understand
*why* this hack has the effect and why it is needed in the first
place. i hope that steve can weigh in and provide an explanation.

on one of the machines, where your test input/code fails, i tried all
the usual suspects (debugger, valgrind) and couldn't really find any
explanation of what is happening. after adding a hack to bypass the
assertion, i noticed that this would only happen until the first
neighbor list rebuild and then the assertions were passed cleanly. so
my conclusion is that something is not yet ready. this seems to be
solvable, by changing:

neighbor->build_one(list);

into

neighbor->build_one(list,1);

there were a few more changes, so my total patch is below. some comments on it:
initializing iMaxAtoms to -1 makes certain, that vector_atom[] can
always be dereferenced.
with 0, you may not trigger allocation on sparse systems where an MPI
rank may not own any (local) atoms.
for the same reason, it is safer to use atom->nmax (maximum value of
atom->nlocal+atom->nghost across all MPI ranks)
instead of atom->nlocal. doesn't make a difference here, but could be
important, in case you need to do some kind of MPI_Allreduce() on
this.
i made sure there is no memory leak with the per-atom storage and that
at a run 0 or when continuing a run, the computes always execute at
least once.

HTH,
     axel.

--- /home/akohlmey/Downloads/min_example/compute_test_nbl.cpp
2016-05-03 10:13:26.000000000 -0400
+++ compute_test_nbl.cpp 2016-05-04 17:43:07.383184818 -0400
@@ -26,7 +26,7 @@

TestNBL::TestNBL(LAMMPS *lmp, int narg, char **arg) :
   Compute(lmp, narg, arg),
- iMaxAtoms(0)
+ iMaxAtoms(-1)
{
   // ------------------------------------------------------------------------------
   // -------------------------- Reading arguments

i am copying steve, because i don’t fully understand
why this hack has the effect and why it is needed in the first
place. i hope that steve can weigh in and provide an explanation.

this seems to be
solvable, by changing:

neighbor->build_one(list);
into
neighbor->build_one(list,1);

than something weird seems to be going on with the neighbour lists when they are called from a >compute_scalar() method right after a simulation is started and I don’t think it is related to the NEIGHMASK.

The preflag=0/1 setting in build_one is supposed to help

it be smart about when it actually needs to re-build the list.

No need to do it if the regular neighbor lists have not changed.

The preflag logic is different depending on when during the timestep

the neighbor list is requested (before or after regular neigh lists

are rebuilt). Most requests come after, so defaut preflag = 0.

When is your compute wanting a neigh list, i.e. what does right after

a simulation starts mean? Is this happening in setup as opposed

to during the timestep? What are the values of mylist->lastbuild

and lastcall when your compute first invokes this build_one() method?

Also, thee is no harm in calling this with preflag=1, just means it might be

slightly inefficient and rebuild too often.

Steve

Hi Steve!

It happens on the first call to compute->compute_scalar() after the simulation is continued. I do a first run of 10000 steps and then start the simulation again with “run 10000”. When the assertion fails, the values of last_build and neighbor->lastcall are as follows:

last_build_pre -> 10000, last_build_post -> 10000
lastcall_pre -> 10000, lastcall_post -> 10000

pre stands for the value before the call to build_one and post for the value after.

As Axel wrote, doing the following seems to do the trick:

if (invoked_peratom < 0) neighbor->build_one(list,1);
else neighbor->build_one(list,0);

@Axel: Thanks for this fix!

Best,
Clemens

Hi again!

I’ve done some more testing on this issue, and it turns out that explicitly rebuilding the neighborlist by using

neighbor->build_one(list,1);

solves the problem in this instance, but it turns out that I run into the same issue at other points of the simulation (I continue simulations a lot, because I use LAMMPS to do the MD for some other code I’ve written, so there are a lot of short trajectories.).

Do you have an idea where to start looking into the neighbor lists?

Best,
Clemens

Just neighbor.cpp.

Steve