fix store/state segfault + setting eflag_atom from fix

Shern_Tee · February 11, 2020, 3:16am

Dear all,

(1) I have a copy of LAMMPS (9 Jan 2020 compiled with GCC) which
sometimes segfaults when asked to initialize a fix store/state with a
per-atom compute:

compute foo all pe/atom pair #also with ke/atom, chunk/atom type
#run 0
fix foobar all store/state 1 c_foo

or per-atom variable:

variable foo atom x*y
#run 0
fix foobar all store/state 1 v_foo

I have tested this on two different systems (an all-atom ionic liquid in
bulk with 4800 atoms, and a acetonitrile-solute system between
electrodes from lammps-conp with 432 atoms). In both cases I get an
occasional segfault when starting up the run (with the run proceeding
normally otherwise), with the 4800-atom system segfaulting more often
than the 432-atom system. Inserting "run 0" before the store/state fix
prevents the segfault, suggesting that some memory is (sometimes!) not
properly assigned before the system has been fully setup.

Valgrind traces the error to line 550 in end_of_step() in
fix_store_state.cpp:

if (cfv_any && nevery) {
const bigint nextstep = (update->ntimestep/nevery)*nevery + nevery;
modify->addstep_compute(nextstep);
}

but I don't understand why that went wrong. Is it that the update /
modify structures need a run 0 to be properly assigned?

Also, I have two copies of LAMMPS (on a HPC) compiled with Intel
compilers, dating to 5 Jun 2019 and 9 Jan 2020 respectively, which do
not replicate this behavior (that is, fix store/state works with no
segfault, even without "run 0").

What can I do about this bug (assuming it replicates on other machines)?

akohlmey · February 11, 2020, 5:13am

(1) thanks for reporting.
indeed, using run 0 is required to properly update the list of computes that require data being collected during a run and the call to modify->addstep_compute() is thus not valid.
howeve, there is an alternate function (Modify::addstep_compute_all()) that can always be called. With the attached patch, any call of Modify::addstep_compute() will be deferred to that alternate function in case the run has not yet been initialized.

(2) you signal the need to collect data with calling modify->addstep_compute(nextstep), where nextstep is a bigint variable indicating the next timestep on which the data is required.
this is what fix store/state (or fix ave/atom) and so on will do.

Axel.

modify-addstep-noinit.diff (1.33 KB)