LAMMPS segfaults when built with Kokkos and CUDA support

Hi,

I am trying to build LAMMPS (15May15) to run on a cluster with Nvidia
Kepler GPUs. The cluster is running CENTOS 6 with Intel CPUs.

I have been able to successfully compile and run LAMMPS with the GPU
and USER-OMP packages enabled. I now want to build with the Kokkos
package so I can compare the performance on the cluster. When I use
OpenMPI 1.8.5, GCC 4.8.4 and CUDA 7.0.28 to compile the attached
Makefile, the compilation is successful with no warnings or errors. If
I try to execute ./lmp_kokkos_cuda however, the output is just
"Segmentation Fault". The same thing happens with GCC 4.9.2. This is
the output from GDB:

(gdb) backtrace
#0 __exchange_and_add_dispatch (this=0x0, __a=...) at
/data/opt/gcc-4.9.2/include/c++/4.9.2/ext/atomicity.h:84
#1 std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::_Rep::_M_dispose (this=0x0, __a=...) at
/data/opt/gcc-4.9.2/include/c++/4.9.2/bits/basic_string.h:245

I also tried running it through valgrind, here's the output:
==90365== Invalid read of size 4
==90365== at 0x3657D4:
std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone
.part.3] (atomicity.h:67)
==90365== Address 0x10 is not stack'd, malloc'd or (recently) free'd

Can anyone help with this problem? Is there any other debugging info
that I can provide?

Makefile.kokkos_cuda (2.89 KB)

Hi,

I am trying to build LAMMPS (15May15) to run on a cluster with Nvidia
Kepler GPUs. The cluster is running CENTOS 6 with Intel CPUs.

I have been able to successfully compile and run LAMMPS with the GPU
and USER-OMP packages enabled. I now want to build with the Kokkos
package so I can compare the performance on the cluster. When I use
OpenMPI 1.8.5, GCC 4.8.4 and CUDA 7.0.28 to compile the attached
Makefile, the compilation is successful with no warnings or errors. If
I try to execute ./lmp_kokkos_cuda however, the output is just
"Segmentation Fault". The same thing happens with GCC 4.9.2. This is
the output from GDB:

(gdb) backtrace
#0 __exchange_and_add_dispatch (this=0x0, __a=...) at
/data/opt/gcc-4.9.2/include/c++/4.9.2/ext/atomicity.h:84
#1 std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::_Rep::_M_dispose (this=0x0, __a=...) at
/data/opt/gcc-4.9.2/include/c++/4.9.2/bits/basic_string.h:245

I also tried running it through valgrind, here's the output:
==90365== Invalid read of size 4
==90365== at 0x3657D4:
std::string::_Rep::_M_dispose(std::allocator<char> const&) [clone
.part.3] (atomicity.h:67)
==90365== Address 0x10 is not stack'd, malloc'd or (recently) free'd

Can anyone help with this problem? Is there any other debugging info
that I can provide?

KOKKOS is still under heavy development and considered experimental.
unless you want to do KOKKOS development yourself, you should not
install it.

before compiling with MPI, i would suggest to compile without using
the internal MPI STUBS library.

axel.

I understand, and we will primarily use GPU or USER-CUDA. I would
still like to try get KOKKOS working though.

I recompiled it with MPI_STUBS and get the exact same error.

Stan can probably advise.

Steve

I understand, and we will primarily use GPU or USER-CUDA. I would
still like to try get KOKKOS working though.

why do you use -shared under LINKFLAGS in your makefile?

that flag is used to create a shared library, not a regular
executable. thus you get the segmentation fault.

this is not a KOKKOS problem after all.

axel.

If I remove -shared I get this:

# mpicxx \-std=c\+\+11 \-D\_\_CUDA\_ARCH\_\_=350 \-E \-x c\+\+ \-DCUDA\_DOUBLE\_MATH\_FUNCTIONS \-\-std=c\+\+11 \-fopenmp \-O3 \-D\_\_CUDA\_PREC\_DIV \-D\_\_CUDA\_PREC\_SQRT \-I&quot;\.\./\.\./lib/kokkos/core/src&quot; \-I&quot;\.\./\.\./lib/kokkos/containers/src&quot; \-I&quot;\.\./\.\./lib/kokkos/algorithms/src&quot; \-I&quot;\.\./\.\./lib/kokkos/linalg/src&quot; \-I&quot;\.\./&quot; &quot;\-I/opt/cuda\-7\.0/bin/\.\.//include&quot; \-m64 \-g \-gdwarf\-2 &quot;/tmp/tmpxft\_0001ae4d\_00000000\-4\_kokkos\_depend\.cudafe1\.cpp&quot; &gt; &quot;/tmp/tmpxft\_0001ae4d\_00000000\-14\_kokkos\_depend\.ii&quot; \# mpicxx -std=c++11 -c -x c++ --std=c++11 -fopenmp -O3
-I"../../lib/kokkos/core/src" -I"../../lib/kokkos/containers/src"
-I"../../lib/kokkos/algorithms/src" -I"../../lib/kokkos/linalg/src"
-I"../" "-I/opt/cuda-7.0/bin/..//include" -fpreprocessed -m64 -g
-gdwarf-2 -o "kokkos_depend.o"
"/tmp/tmpxft_0001ae4d_00000000-14_kokkos_depend.ii"
/usr/local/bin/ld: -f may not be used without -shared
collect2: error: ld returned 1 exit status

which is why I tried adding it.