Memory issue with qeq/reax/kk on GPU

Hello,

Recently I wanted to try the KOKKOS package and I encountered a problem with qeq/reax/kk. Namely, when I start a simulation on a GPU I get an “out of memory” error, even when there is only one atom in a system. A simple input script which triggers this error is attached to the e-mail. The command I use for running the simulation is “lmp_kokkos -k on gpus 1 -sf kk -in out_of_memory.in”.

The GPU is GTX 1080 and I used “Maxwell53” as a value of the “KOKKOS_ARCH” variable, as it is the highest compute capability available. I have no problems with using other KOKKOS-accelerated pair styles, including reax/c/kk with the charge equilibration disabled.

Below is a stack trace from gdb.

I would greatly appreciate any help.

Cheers,
Michal

#0 0x00007f1527357418 in __GI_raise (sig=sig@…4654…=6) at …/sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f152735901a in __GI_abort () at abort.c:89
#2 0x00007f1527ebb84d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007f1527eb96b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f1527eb9701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007f1527eb9919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x0000000000e4e443 in Kokkos::Impl::throw_runtime_exception (msg=“cudaMalloc( &ptr, arg_alloc_size ) error( cudaErrorMemoryAllocation): out of memory …/…/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:249”)
at …/…/lib/kokkos/core/src/impl/Kokkos_Error.cpp:72
#7 0x0000000000e58970 in Kokkos::Impl::cuda_internal_error_throw (e=, name=name@…4654…=0xeecdc0 “cudaMalloc( &ptr, arg_alloc_size )”, file=file@…4654…=0xeeccc8 “…/…/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp”,
line=line@…4654…=249) at …/…/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:133
#8 0x0000000000e54bd5 in Kokkos::Impl::cuda_internal_safe_call (line=249, file=0xeeccc8 “…/…/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp”, name=0xeecdc0 “cudaMalloc( &ptr, arg_alloc_size )”, e=)
at …/…/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Error.hpp:60
#9 Kokkos::CudaSpace::allocate (arg_alloc_size=18446744073564053424, this=0x7ffc32d3e500) at …/…/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:249
#10 Kokkos::Experimental::Impl::SharedAllocationRecord<Kokkos::CudaSpace, void>::SharedAllocationRecord (this=0x88b10b0, arg_space=…, arg_label=“qeq/kk:jlist”, arg_alloc_size=18446744073564053296,
arg_dealloc=0xa14610 <Kokkos::Experimental::Impl::(anonymous namespace)::deallocate<Kokkos::CudaSpace, Kokkos::Experimental::Impl::ViewValueFunctor<Kokkos::Cuda, int, true> >(Kokkos::Experimental::Impl::SharedAllocationRecord<void, void>)>) at …/…/lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:451
#11 0x0000000000a2a8ca in Kokkos::Experimental::Impl::SharedAllocationRecord<Kokkos::CudaSpace, Kokkos::Experimental::Impl::ViewValueFunctor<Kokkos::Cuda, int, true> >::SharedAllocationRecord (arg_alloc=18446744073564053296,
arg_label=“qeq/kk:jlist”, arg_space=…, this=0x88b10b0) at …/…/lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.hpp:209
#12 Kokkos::Experimental::Impl::SharedAllocationRecord<Kokkos::CudaSpace, Kokkos::Experimental::Impl::ViewValueFunctor<Kokkos::Cuda, int, true> >::allocate (arg_alloc=18446744073564053296, arg_label=“qeq/kk:jlist”, arg_space=…)
at …/…/lib/kokkos/core/src/impl/KokkosExp_SharedAlloc.hpp:234
#13 Kokkos::Experimental::Impl::ViewMapping<Kokkos::Experimental::ViewTraits<int
, Kokkos::LayoutLeft, Kokkos::Cuda, void>, void>::allocate_shared<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::integral_constant<unsigned int, 0u>, Kokkos::CudaSpace, Kokkos::Cuda> (arg_layout=…, arg_prop=…, this=0x7ffc32d3e5e8) at …/…/lib/kokkos/core/src/impl/KokkosExp_ViewMapping.hpp:2568
#14 Kokkos::Experimental::View<int*, Kokkos::LayoutLeft, Kokkos::Cuda, void>::View<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >(Kokkos::Experimental::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits, std::allocator > > const&, std::enable_if<!Kokkos::Experimental::Impl::ViewCtorProp<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >::has_pointer, Kokkos::LayoutLeft>::type const&) (this=0x7ffc32d3e5e0, arg_prop=…, arg_layout=…) at …/…/lib/kokkos/core/src/KokkosExp_View.hpp:1238
#15 0x0000000000d035b4 in Kokkos::Experimental::View<int*, Kokkos::LayoutLeft, Kokkos::Cuda, void>::View<char [13]> (arg_N7=0, arg_N6=0, arg_N5=0, arg_N4=0, arg_N3=0, arg_N2=0, arg_N1=0, arg_N0=, arg_label=…,
this=0x7ffc32d3e5e0) at …/…/lib/kokkos/core/src/KokkosExp_View.hpp:1337
#16 LAMMPS_NS::FixQEqReaxKokkosKokkos::Cuda::allocate_matrix (this=this@…4654…=0x88c16b0) at …/fix_qeq_reax_kokkos.cpp:306
#17 0x0000000000d0cc78 in LAMMPS_NS::FixQEqReaxKokkosKokkos::Cuda::pre_force (this=0x88c16b0, vflag=) at …/fix_qeq_reax_kokkos.cpp:232
#18 0x00000000004c90b4 in LAMMPS_NS::ModifyKokkos::setup_pre_force (this=0x8dc9b70, vflag=2) at …/modify_kokkos.cpp:118
#19 0x00000000009f824e in LAMMPS_NS::VerletKokkos::setup (this=0x8dced20) at …/verlet_kokkos.cpp:123
#20 0x000000000048fe6e in LAMMPS_NS::Run::command (this=this@…4654…=0x7ffc32d442d0, narg=narg@…4654…=1, arg=arg@…4654…=0x8dc36c0) at …/run.cpp:177
#21 0x00000000005feb69 in LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run> (lmp=, narg=1, arg=0x8dc36c0) at …/input.cpp:859
#22 0x00000000005fce4f in LAMMPS_NS::Input::execute_command (this=this@…4654…=0x5290050) at …/input.cpp:842
#23 0x00000000005fd9b7 in LAMMPS_NS::Input::file (this=0x5290050) at …/input.cpp:243
#24 0x0000000000418af6 in main (argc=9, argv=0x7ffc32d44598) at …/main.cpp:31

out_of_memory.in (430 Bytes)

It is because in the allocate_matrix function the NumNeigh functor is expecting a number which is the number of neighbors of atom I. Since there is only 1 atom in the system there is no such number… This segment of the code should initialize the number to zero, but to be fair, a system of 1 atom with qeq/reax does not make any sense.

In most cases, having only one atom or only a handful of atoms in a big box tests the extreme of memory allocation… I suggest you test this with a regular system, I.e., some of the reax examples. You get good performance on GPU with a larger size anyway.

Hope this helps.

Ray

Thank you for the prompt response.

Unfortunately the problem occurs in other systems as well. Actually I encountered a new scenario when trying to run a simulation (I used the reax/AB example from the LAMMPS repository). If I only add “newton on” to the input script, the error message is (a stack trace attached as stack_trace_1.txt):

what(): cudaDeviceSynchronize() error( cudaErrorIllegalAddress): an illegal memory access was encountered …/…/lib/kokkos/core/src/Cuda/Kokkos_Cuda_Impl.cpp:122

The offending line appears to be in the lmp_control file: “tabulate_long_range 10000”. If I delete that, the LAMMPS behaviour is similar to the one atom case from my first mail. The stack trace is practically the same and commenting out fix qeq/reax allows for a normal run.

I noticed though that when I run LAMMPS compiled with KOKKOS several processes are started. Is it an expected behaviour? I use only a single thread and a single GPU.

Cheers,
Michal

stack_trace_1.txt (4.56 KB)

Michal,

I can’t reproduce your error using the out_of_memory.in input script you attached. It runs fine for me and I only see a single process when I run with 1 thread and GPU. I think something may have gone wrong with your build–can you give a few more details? Could you try running the LJ benchmark in.lj in /bench with Kokkos and see if that works?

I checked and inum is 1 since there is 1 atom, so I don’t think the problem is in NumNeigh functor.

Thanks,

Stan

Yes, I can run the LJ benchmark with KOKKOS without any issue. The log is attached to the e-mail.

System details:
Ubuntu 16.04 LTS
Kernel version: 4.4.0-31-generic
nVidia driver version: 367.35
CUDA toolkit version: 8.0
openMPI version: 2.0.0
gcc version: 5.4.0

I used Makefile.kokkos_cuda_openmpi from MAKE/OPTIONS and only changed the “KOKKOS_ARCH” variable to “Maxwell53”.

Cheers,
Michal

log.lammps (2.5 KB)

Michal,

I can’t reproduce your error using the out_of_memory.in input script you attached. It runs fine for me and I only see a single process when I run with 1 thread and GPU. I think something may have gone wrong with your build–can you give a few more details? Could you try running the LJ benchmark in.lj in /bench with Kokkos and see if that works?

I checked and inum is 1 since there is 1 atom, so I don’t think the problem is in NumNeigh functor.

Thanks,

Stan

Michal, it looks like Kokkos doesn’t yet work with your GPU: https://github.com/kokkos/kokkos/issues/398. This should be fixed soon. Your input script runs fine for me on a K80 GPU, so I don’t believe there is a problem in the Kokos reaxFF code.

Stan