[lammps-users] MPI process will always be terminated unexpectedly at timestep 100

Hi, LAMMPSians,

I tried to write a new fix to simulate the irradiation and delete atoms out of defined region according to “fix_evaporate.cpp” and “fix_heat.cpp”. The variable “nevery” means the new fix takes effect every this many steps.

when i set the “nevery” equal or less than 100, the program run successfully. However, if “nevery” is larger than 1000, MPI process will always be terminated unexpectedly at timestep 100! The screen output message is as below:

[…]
78 152.22001 199.19884 -29607.275 91.461014 -29515.814 -0.0001 200.2201
79 141.32924 188.48716 -29601.196 85.8254 -29515.37 -0.0001 200.2201
80 129.2242 176.14692 -29594.291 79.425406 -29514.866 -0.0001 200.2201
81 120.84583 167.64716 -29589.37 74.985074 -29514.385 -0.0001 200.2201
82 119.83143 167.0675 -29588.529 74.523784 -29514.005 -0.0001 200.2201
83 127.00675 175.3953 -29592.276 78.51562 -29513.76 -0.0001 200.2201
84 140.05614 190.10145 -29599.33 85.696724 -29513.634 -0.0001 200.2201
85 154.49226 206.18888 -29607.16 93.597983 -29513.562 -0.0001 200.2201
86 165.49278 218.23245 -29613.024 99.557768 -29513.466 -0.0001 200.2201
87 169.81845 222.54468 -29615.061 101.78148 -29513.279 -0.0001 200.2201
88 167.0327 218.64439 -29613.009 100.03353 -29512.976 -0.0001 200.2201
89 159.54917 209.46514 -29608.281 95.701463 -29512.58 -0.0001 200.2201
90 151.53251 200.17489 -29603.367 91.210433 -29512.156 -0.0001 200.2201
91 147.13013 195.98482 -29600.792 89.000998 -29511.791 -0.0001 200.2201
92 148.72752 199.82675 -29602.014 90.469119 -29511.545 -0.0001 200.2201
93 155.9623 211.02468 -29606.783 95.353123 -29511.43 -0.0001 200.2201
94 165.9337 225.62667 -29613.264 101.86262 -29511.402 -0.0001 200.2201
95 174.48138 238.15025 -29618.847 107.46181 -29511.385 -0.0001 200.2201
96 177.96633 243.86713 -29621.239 109.9353 -29511.304 -0.0001 200.2201
97 174.80196 240.66009 -29619.396 108.28715 -29511.109 -0.0001 200.2201
98 166.11524 229.76577 -29613.923 103.12583 -29510.797 -0.0001 200.2201
99 155.29171 215.21675 -29606.819 96.409425 -29510.409 -0.0001 200.2201
MPI process terminated unexpectedly
Exit code -5 signaled from a440
Killing remote processes…DONE
Signal 15 received.

I’ve no idea what is the problem with the program. The attached files are the fix i wrote. Would you help me to check and think of reasons for this error?

Thank you very much!

Wenpeng Zhu

2010-03-27

fix_irradiation.h (1.23 KB)

fix_irradiation.cpp (8.18 KB)

first of all, does it run in serial?

debugging in parallel is always difficult,
since many MPI environments don't forward
error output and codes properly.

axel.

2010/3/26 Wenpeng Zhu <[email protected]>:

The error is typical of a bug in your code. Meaning one of your
processors crashed, which in turn crashed MPI. Running in serial
is a good first step. Try putting in print statements to see where you
are crashing or run under an error checker like valgrind.

Steve

2010/3/26 Wenpeng Zhu <[email protected]>: