[lammps-users] pppm causes seg fault error

Dear LAMMPS users,
I’m not sure if this is the right place to post this. I have been experiencing an error when my run reaches a write_restart command (the restart command works fine). The odd thing is that the problem only occurs when I run on Redsky (a computing cluster at Sandia). I have been using pppm in slab mode, and that seems to be causing the problem. Here is the onset of the error:

Total # of neighbors = 5658728
Ave neighs/atom = 229.544
Ave special neighs/atom = 0
Neighbor list builds = 9
Dangerous builds = 0
System init for write_restart …
PPPM initialization …
[rs862:10649] *** Process received signal ***
[rs862:10649] Signal: Segmentation fault (11)
[rs862:10649] Signal code: (-6)
[rs862:10649] Failing at address: 0x108e400002999
[rs862:10649] [ 0] /lib64/libpthread.so.0 [0x2b13dd5524c0]
[rs862:10649] [ 1] /lib64/libpthread.so.0(raise+0x2d) [0x2b13dd55238d]
[rs862:10649] [ 2] /projects/global/x86_64/compilers/intel/intel-11.1-cproc-064/mkl/lib/em64t/libiomp5.so [0x2b13dd3ea4a2]
[rs862:10649] *** End of error message ***

… and so on for all other processors …

I’ve also attached my input deck. Can anybody help? Thanks.

Jonathan

in.water (2.11 KB)

Dear LAMMPS users,
I'm not sure if this is the right place to post this. I have been
experiencing an error when my run reaches a write_restart command (the
restart command works fine). The odd thing is that the problem only occurs
when I run on Redsky (a computing cluster at Sandia). I have been using
pppm in slab mode, and that seems to be causing the problem. Here is the
onset of the error:

Total # of neighbors = 5658728
Ave neighs/atom = 229.544
Ave special neighs/atom = 0
Neighbor list builds = 9
Dangerous builds = 0
System init for write_restart ...
PPPM initialization ...
[rs862:10649] *** Process received signal ***
[rs862:10649] Signal: Segmentation fault (11)
[rs862:10649] Signal code: (-6)
[rs862:10649] Failing at address: 0x108e400002999
[rs862:10649] [ 0] /lib64/libpthread.so.0 [0x2b13dd5524c0]
[rs862:10649] [ 1] /lib64/libpthread.so.0(raise+0x2d) [0x2b13dd55238d]
[rs862:10649] [ 2]
/projects/global/x86_64/compilers/intel/intel-11.1-cproc-064/mkl/lib/em64t/libiomp5.so
[0x2b13dd3ea4a2]
[rs862:10649] *** End of error message ***

hmmm.... this looks like a threading related problem.
mkl by default tries to multi-thread across all available
cores and that typically results in all kinds of problems
(not to mention that it doesn't help a bit with improving
performance in this case). you can set the OMP_NUM_THREADS
environment variable to 1 to turn this behavior off.

second possibility is that you are linking the wrong
fftw wrapper. with the fftw2 wrappers it is very easily
possible to mix up single and double precision, since
the API is exactly the same, while the ABI is not.

if you have the time, please have a look at the massively
expanded and revised FFT interface in the LAMMPS-ICMS
branch. it has direct support for MKL's FFTs without having to
use the fftw2 or fftw3 wrappers and many others as well as full
support for single precision FFTs (based on an implementation
from phil blood).

http://sites.google.com/site/akohlmey/software/lammps-icms

... and so on for all other processors ...

I've also attached my input deck. Can anybody help? Thanks.

to debug this, i would need the data file as well.

cheers,
   axel.

Hi Axel,
Thanks for your response. I checked your first suggestion and it didn’t help. I suspect your second suggestion is correct, and I’m trying to see if I can correct it.
I originally tried attaching my datafile as well, but it was too large for the user list. I’ll send it to you separately.

Jonathan

I swapped in a new set of modules and it is working now. Thanks for the help.

Jonathan

Here are the modules that I loaded (for Redsky), FYI:

compilers/intel-11.1-f064-c064
mpi/openmpi-1.4.1_oobpr_intel-11.1-f064-c064
misc/env-openmpi-1.4-oobpr
libraries/fftw-2.1.5_openmpi-1.4.1_oobpr_intel-11.1-f064-c064

Jonathan