I have built LAMMPS version (18 Feb 2011) for gpu using openmpi, gnu compilers, and cuda 3.2. My test run using 2 nodes, 8 cores each, and 4 gpus, fails with
[tesla2:00561] Signal: Segmentation fault (11)
[tesla2:00561] Signal code: Address not mapped (1)
My GPU setup is:
fix 0 all gpu force/neigh 0 1 -1
and I am using pair style:
lj/cut/coul/long
Below is output from near where things start to go bad:
#1
fix 1 all nvt temp 600.0 600.0 100.0
velocity all create 600 58447419
run 50000
Ewald initialization ...
G vector = 0.182103
vectors: actual 1d max = 11443 17 21437---------------------------------------------------------------------
GPU Time Info (average):
---------------------------------------------------------------------
Average split: 0.9995.
Max Mem / Proc: 0.75 MB.
-----------------------------------------------------------------------------------------------------------------------------------------------
- Using GPGPU acceleration for lj/cut/coul/long:
- with 4 procs per device.
--------------------------------------------------------------------------
GPU 0: Tesla T10 Processor, 240 cores, 3.9/4 GB, 1.4 GHZ (Mixed Precision)
GPU 1: Tesla T10 Processor, 240 cores, 3.9/4 GB, 1.4 GHZ (Mixed Precision)
--------------------------------------------------------------------------Initializing GPU and compiling on process 0...Done.
Initializing GPUs 0-1 on core 0...Done.
Initializing GPUs 0-1 on core 1...Done.
Initializing GPUs 0-1 on core 2...Done.
Initializing GPUs 0-1 on core 3...Done.Setting up run ...
Memory usage per processor = 13.2739 Mbytes
Step TotEng PotEng KinEng Temp Press Volume E_vdwl E_coul E_bond E_angle E_dihed
192 13681.396 7337.6365 6343.7599 600 286.40435 1000000 4235.9953 3538.9173 519.48009 2274.5046 376.66896
...
16500 19746.546 13416.608 6329.9374 598.69264 -35.471398 1000000 4463.6599 3522.1983 2997.7977 4341.7209 1707.243
17000 19699.376 13433.31 6266.0661 592.65163 67.47609 1000000 4548.828 3522.4822 2944.6102 4314.4425 1716.4254
[tesla2:00561] *** Process received signal ***
[tesla2:00561] Signal: Segmentation fault (11)
[tesla2:00561] Signal code: Address not mapped (1)
[tesla2:00561] Failing at address: 0xfffffffe034017f0
[tesla2:00561] [ 0] /lib64/libpthread.so.0 [0x341320eb10]
[tesla2:00561] [ 1] ./lmp_openmpi-gpu(_ZN9LAMMPS_NS20PairLJCutCoulLongGPU11cpu_computeEPiiii+0x14b) [0x6d8aeb]
[tesla2:00561] [ 2] ./lmp_openmpi-gpu(_ZN9LAMMPS_NS20PairLJCutCoulLongGPU7computeEii+0x309) [0x6d97c9]
[tesla2:00561] [ 3] ./lmp_openmpi-gpu(_ZN9LAMMPS_NS6Verlet3runEi+0x195) [0x7514e5]
[tesla2:00561] [ 4] ./lmp_openmpi-gpu(_ZN9LAMMPS_NS3Run7commandEiPPc+0x284) [0x728814]
[tesla2:00561] [ 5] ./lmp_openmpi-gpu(_ZN9LAMMPS_NS5Input15execute_commandEv+0xa28) [0x63fa08]
[tesla2:00561] [ 6] ./lmp_openmpi-gpu(_ZN9LAMMPS_NS5Input4fileEv+0x3a0) [0x641300]
[tesla2:00561] [ 7] ./lmp_openmpi-gpu(main+0x4b) [0x64990b]
[tesla2:00561] [ 8] /lib64/libc.so.6(__libc_start_main+0xf4) [0x341261d994]
[tesla2:00561] [ 9] ./lmp_openmpi-gpu(__gxx_personality_v0+0x481) [0x487d29]
[tesla2:00561] *** End of error message ***
I can send the complete output file if that will help.
System info:
[[email protected]... gpu2]$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)[[email protected]... gpu2] mpicc \-v Using built\-in specs\. COLLECT\_GCC=/usr/global/gcc/4\.5\.1/bin/gcc COLLECT\_LTO\_WRAPPER=/gpfs/apps/x86\_64\-rhel5/gcc/4\.5\.1/bin/\.\./libexec/gcc/x86\_64\-unknown\-linux\-gnu/4\.5\.1/lto\-wrapper Target: x86\_64\-unknown\-linux\-gnu Configured with: \.\./gcc\-4\.5\.1/configure \-\-prefix=/usr/global/gcc/4\.5\.1 \-\-with\-mpc=/usr/global/mpc/0\.8\.1 \-\-with\-mpfr=/usr/global/mpfr/2\.4\.2 \-\-with\-gmp=/usr/global/gmp/5\.0\.1 \-\-with\-ppl=/usr/global/ppl/0\.10\.2 \-\-with\-cloog=/usr/global/cloog\-ppl/0\.15\.9 \-\-disable\-multilib Thread model: posix gcc version 4\.5\.1 \(GCC\) \[nucci@\.\.\.2526\.\.\. gpu2\] nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2010 NVIDIA Corporation
Built on Wed_Nov__3_16:16:57_PDT_2010
Cuda compilation tools, release 3.2, V0.2.1221
[[email protected]... gpu2]$
I'd like to know from the GPU experts if anything obvious jumps out here. As I mentioned I can attach the entire output file if that'll help.
Thanks,
--Jeff