Unusual hang with wall/piston

I’ve been testing fix_wall_piston from the SHOCK package, and have run into unusual runtime behavior. The following input file, a reduced testcase, reliably results in a hang on a Cray XC40 when using intel (16.x) or gnu (4.9.x) compilers and 16 cores. It does NOT hang with 1 core, and may irregularly crash at varying number of cores in-between. On a SGI ICE X, it fairly reliably hangs with 32 cores, but sometimes completes at 64 cores, and usually completes at lower core counts (intel 15.x compiler). I admit, I don’t usually use wall/piston for shock simulations; there are a variety of alternatives. I’m seeing this behavior with a 1-July version of LAMMPS downloaded from Axel’s repo, but recall seeing a similar problem with a much older version in the past.

Can anyone else reproduce a hang dependent on number of cores using this input? Am I doing something invalid?

Input file, example hung output, and some Makefile details follow below. Thanks.

sincerely,

Brian

I've been testing fix_wall_piston from the SHOCK package, and have run into
unusual runtime behavior. The following input file, a reduced testcase,
reliably results in a hang on a Cray XC40 when using intel (16.x) or gnu
(4.9.x) compilers and 16 cores. It does NOT hang with 1 core, and may
irregularly crash at varying number of cores in-between. On a SGI ICE X, it
fairly reliably hangs with 32 cores, but sometimes completes at 64 cores,
and usually completes at lower core counts (intel 15.x compiler). I admit,
I don't usually use wall/piston for shock simulations; there are a variety
of alternatives. I'm seeing this behavior with a 1-July version of LAMMPS
downloaded from Axel's repo, but recall seeing a similar problem with a much
older version in the past.

Can anyone else reproduce a hang dependent on number of cores using this
input? Am I doing something invalid?

i cannot generate a hang, but i see access to uninitialized data when
running your example under valgrind's memory checker.
that can cause LAMMPS to choose different branches in parallel
execution and if one of them has a collective MPI operation you can
get a hang.

please apply the following (rather simple) change, try again and let
us know, if that addresses your issue.

diff --git a/src/SHOCK/fix_wall_piston.cpp b/src/SHOCK/fix_wall_piston.cpp
index 443b267..3513d6e 100644
--- a/src/SHOCK/fix_wall_piston.cpp
+++ b/src/SHOCK/fix_wall_piston.cpp
@@ -51,7 +51,7 @@ FixWallPiston::FixWallPiston(LAMMPS *lmp, int narg,
char **arg) :
   rampNL3flag = 0;
   rampNL4flag = 0;
   rampNL5flag = 0;
- z0 = vz = 0.0;
+ t_target = z0 = vx = vy = vz = 0.0;
   xloflag = xhiflag = yloflag = yhiflag = zloflag = zhiflag = 0;

   int iarg = 3;

Yes, the patch appears to have fixed this example; tested with varying N_cores on both the Cray and SGI, using Intel compilers. Good valgrind catch, thanks Axel. I endorse this patch for mainline.

Brian