memory issue with fix_vector in stable-9Dec14

_Barnes_Brian_C_CTR · December 23, 2014, 4:59pm

Hello,

I’m running into a problem with fix vector in the latest stable LAMMPS. I’ve reduced it to a short testcase (a modification of the LJ melt example ; runs in under 1 second on 1 core). This has crashed on two distinct HPC systems, and with both serial or MPI executables. I think the problem may be related to memory allocation (not) occurring in fix_vector.cpp’s init() when a run command with “pre no” is used (?). Other data related fixes, such as ave/time, work if inserted in its place. It also works if “pre no” in the final run command is changed to “pre yes”. In my actual use case, I need the “pre no” run command. I’d be really interested in a patch if one is made. I pasted the testcase input and output at the end of this mail.

Also, there’s a broken link on http://lammps.sandia.gov/doc/fix.html – the vector command link goes to vector.html instead of fix_vector.html. I’ve really enjoyed using LAMMPS and have found its documentation and mailing list archives to be quite helpful! Thanks for all your work.

sincerely,

Brian Barnes

-----INPUT-----

3d Lennard-Jones melt

units lj
atom_style atomic

lattice fcc 0.8442
region box block 0 10 0 10 0 10
create_box 1 box
create_atoms 1 box
mass 1 1.0

velocity all create 3.0 87287

pair_style lj/cut 2.5
pair_coeff 1 1 1.0 1.0 2.5

neighbor 0.3 bin
neigh_modify every 20 delay 0 check no

fix 1 all nve

thermo 1

#run 10 # uncommenting this line delays the crash
fix 2 all vector 1 c_thermo_temp # this fix causes crash with later ‘pre no’ run
run 10 pre yes post no
run 10 pre no post yes
-----END INPUT-----

-----OUTPUT-----
LAMMPS (9 Dec 2014)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (16.796 16.796 16.796)
1 by 1 by 1 MPI processor grid
Created 4000 atoms
Setting up run …
Memory usage per processor = 2.19271 Mbytes
Step Temp E_pair E_mol TotEng Press
0 3 -6.7733681 0 -2.2744931 -3.7033504
1 2.9951987 -6.7662407 0 -2.2745659 -3.6588262
2 2.9802109 -6.7439895 0 -2.2747908 -3.5226206
3 2.9532502 -6.7039491 0 -2.2751813 -3.2867189
4 2.9112587 -6.6412126 0 -2.2754163 -2.9370172
5 2.8498812 -6.5487251 0 -2.274972 -2.4537522
6 2.7636568 -6.4191689 0 -2.2747202 -1.8155961
7 2.6467777 -6.2451068 0 -2.2759327 -1.0023218
8 2.4951145 -6.0194029 0 -2.2776669 -0.0006588972
9 2.3100696 -5.742567 0 -2.2783289 1.1746157
10 2.1031859 -5.4317227 0 -2.2777326 2.4556252
Loop time of 0.0311601 on 1 procs for 10 steps with 4000 atoms
Step Temp E_pair E_mol TotEng Press
10 2.1031859 -5.4317227 0 -2.2777326 2.4556252
11 1.8973697 -5.1217377 0 -2.2763946 3.7176309
_pmiu_daemon(SIGCHLD): [NID 00017] [c0-0c0s4n1] [Tue Dec 23 16:47:34 2014] PE RANK 0 exit signal Segmentation fault
Application 4821326 exit codes: 139
Application 4821326 resources: utime ~0s, stime ~1s, Rss ~124240, inblocks ~12868, outblocks ~33468
-----END OUTPUT----

The binary used to run that example was created with an Intel compiler (icc 14.0.2).

sjplimp · December 24, 2014, 3:38pm

thanks for the details, sounds like a bug.
I’ll take a look, but can’t post a patch until
after the holidays.

Steve

sjplimp · January 5, 2015, 4:40pm

By doing this:

run 10 pre yes post no
run 10 pre no post yes

you are explicitly skipping the logic

in the fix vector init() that reallocates
the length of the stored vector on the

2nd run. LAMMPS has no way
of knowing you will do this.

However, I changed the logic so that if
you do this:

run 10 pre yes post no start 0 stop 20
run 10 pre no post yes

it will now work. This allows LAMMPS to do
the allocation of the vector to the full length
on the first run. This is in keeping with this
comment on the run doc page:

IMPORTANT NOTE: If your input script changes the system between 2
runs, then the initial setup must be performed to insure the change is
recognized by all parts of the code that are affected. Examples are
adding a “fix”_fix.html or “dump”_dump.html or “compute”_compute.html,
changing a “neighbor”_neigh_modify.html list parameter, or writing
restart file which can migrate atoms between processors. LAMMPS has
no easy way to check if this has happened, but it is an error to use
the {pre no} option in this case.

and the manner that the start/stop keywords are used by
other commands which allow for multiple runs with “pre no”.

I’ll post a patch later today.

Steve