Problem with multi core Non-numeric box dimension - simulation unstable

Dear all,

Recently I developed a fix style for point induced dipole polarization model. The fix style uses post_force method to add additional force to atom->f. It also uses compute_scalar method to return added energy.

The code works fine when I run with 1, 2, 4 cores but it crashes when I run with 16 cores. Previously, someone encountered such problem when they had two fix styles which change the size of simulation box but I only have one fix/npt. Hope someone could give me an advice. Thanks!

Han.

Log file is as the following:

LAMMPS (15 May 2015)
Reading data file …
orthogonal box = (0 0 0) to (19.71 19.71 19.71)
2 by 2 by 4 MPI processor grid
reading atoms …
768 atoms
scanning bonds …
2 = max bonds/atom
scanning angles …
1 = max angles/atom
reading bonds …
512 bonds
reading angles …
256 angles
Finding 1-2 1-3 1-4 neighbors …
Special bond factors lj: 0 0 0
Special bond factors coul: 0 0 0
2 = max # of 1-2 neighbors
1 = max # of 1-3 neighbors
1 = max # of 1-4 neighbors
2 = max # of special neighbors
768 atoms in group tip4p_def
512 atoms in group hydrogen
256 atoms in group oxygen
Finding SHAKE clusters …
0 = # of size 2 clusters
0 = # of size 3 clusters
0 = # of size 4 clusters
256 = # of frozen angles
PPPM initialization …
extracting TIP4P info from pair style
G vector (1/distance) = 0.24176
grid = 15 15 15
stencil order = 5
estimated absolute RMS force accuracy = 0.00261389
estimated relative force accuracy = 7.87164e-06
using double precision FFTs
3d grid and FFT values/proc = 2156 256
Neighbor list info …
5 neighbor list requests
update every 1 steps, delay 0 steps, check yes
master list distance cutoff = 15.3092
Setting up run …
Memory usage per processor = 9.23639 Mbytes
Step Time tt Temp TempAve Press PressAve PEAve_Mo DensAve Volume order
0 0 0 447.29159 0 -23545.674 0 0 0 7657.0216 0.494379
ERROR on proc 7: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 14: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 4: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 5: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 13: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 5
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 7
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 14
ERROR on proc 1: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 6: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 8: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 11: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 4
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 13
ERROR on proc 10: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 12: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 15: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 6
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 8
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 11
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 15
ERROR on proc 0: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 2: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
ERROR on proc 9: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 10
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 12
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 9
ERROR on proc 3: Non-numeric box dimensions - simulation unstable (…/pppm_tip4p.cpp:77)
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 3

Input file is as the following:

Dear all,

Recently I developed a fix style for point induced dipole polarization
model. The fix style uses post_force method to add additional force to
atom->f. It also uses compute_scalar method to return added energy.

The code works fine when I run with 1, 2, 4 cores but it crashes when I run
with 16 cores. Previously, someone encountered such problem when they had
two fix styles which change the size of simulation box but I only have one
fix/npt. Hope someone could give me an advice. Thanks!

your input is near impossible to read with all the variables and
things that are not really needed. i know they are convenient, but for
debugging from the outside, they make it very convoluted. you should
strip your inputs to the absolute minimum when you want people to have
a serious look at it.

first of all, you should try running with fixed volume (and ideally
fix nve, to reduce compications) to have a better handle on whether
fix npt is confused or something else is going belly-up. also, you
should output thermo output every step and also output positions,
velocities and forced for all atoms at every steps (with sorted
dumps), so you can compare the different runs and see exactly at which
step the trouble starts.

please note that there are *at least two* possible ways how you get
the non-numerical box dimension error: 1) when you have multiple fixes
trying to change the box, but also 2) when your forces go haywire
(become NaN). since the pressure is computed from the virial, which is
in turn computed from the forces, this has to be considered as well.
in addition, since you already have established a track record of not
being particularly skilled at reading/understanding the communication
patterns and flow of control in LAMMPS, it is quite likely that you
have messed something up related to communications and thus now run
into problems. unfortunately for you, this is practically impossible
to debug from the outside.

good luck. you're going to need it. you are dealing with the parts of
the LAMMPS code that separates the amateurs from the professionals.
;-),
     axel.