I am trying to use the Kokkos package so as to utilize the GPU in some ReaxFF calculations. When I try to run an example calculation (examples/reaxff/RDX), with the following command: lmp -in in.RDX -suffix kk > out.RDX
I get the following error:
terminate called after throwing an instance of 'std::runtime_error'
what(): Constructing View and initializing data with uninitialized execution space
Traceback functionality not available
[Heisenberg:10160] *** Process received signal ***
[Heisenberg:10160] Signal: Aborted (6)
[Heisenberg:10160] Signal code: (-6)
[Heisenberg:10160] [ 0] /lib64/libpthread.so.0(+0xf370)[0x7f0c533ae370]
[Heisenberg:10160] [ 1] /lib64/libc.so.6(gsignal+0x37)[0x7f0c527771d7]
[Heisenberg:10160] [ 2] /lib64/libc.so.6(abort+0x148)[0x7f0c527788c8]
[Heisenberg:10160] [ 3] /home/rashid/Apps/spack/opt/spack/linux-centos7-broadwell/gcc-11.2.0/gcc-7.5.0-eoiw7jz56uxm2vul7osvg4xhrfdwnwk5/lib64/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_ter
minate_handlerEv+0x125)[0x7f0c530ad7d5]
[Heisenberg:10160] [ 4] /home/rashid/Apps/spack/opt/spack/linux-centos7-broadwell/gcc-11.2.0/gcc-7.5.0-eoiw7jz56uxm2vul7osvg4xhrfdwnwk5/lib64/libstdc++.so.6(+0x8f5b6)[0x7f0c530ab5b6]
[Heisenberg:10160] [ 5] /home/rashid/Apps/spack/opt/spack/linux-centos7-broadwell/gcc-11.2.0/gcc-7.5.0-eoiw7jz56uxm2vul7osvg4xhrfdwnwk5/lib64/libstdc++.so.6(+0x8f601)[0x7f0c530ab601]
[Heisenberg:10160] [ 6] /home/rashid/Apps/spack/opt/spack/linux-centos7-broadwell/gcc-11.2.0/gcc-7.5.0-eoiw7jz56uxm2vul7osvg4xhrfdwnwk5/lib64/libstdc++.so.6(+0x8f843)[0x7f0c530ab843]
[Heisenberg:10160] [ 7] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x16db7c0]
[Heisenberg:10160] [ 8] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x53a280]
[Heisenberg:10160] [ 9] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0xd73889]
[Heisenberg:10160] [10] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x9293b1]
[Heisenberg:10160] [11] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x9260bf]
[Heisenberg:10160] [12] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x91e0a4]
[Heisenberg:10160] [13] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x4781f4]
[Heisenberg:10160] [14] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x4746d6]
[Heisenberg:10160] [15] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x412ddf]
[Heisenberg:10160] [16] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0c52763b35]
[Heisenberg:10160] [17] /home/rashid/Apps/lammps-30Jul2021/build_gcc_kokkos_cuda11/lmp[0x44c1a3]
[Heisenberg:10160] *** End of error message ***
Any pointers on how to solve this so as to get the Kokkos package running would be very helpful.
I am using LAMMPS-30Jul2021, compiled with GCC 7.5.0 using openmpi and cuda 11.4.0. The cmake command I used to build LAMMPS is:
I forgot to update the results here after my previous message.
This was the issue. I had not read the Kokkos package page fully. After fixing my command based on the instructions there, the issue was solved. The command which worked was this:
lmp -in in.RDX -k on g 1 -suffix kk -pk kokkos newton on neigh half > out.RDX
Even though the calculation completes successfully, the following error gets displayed after the calculation each time I run LAMMPS using the above command:
Kokkos::Cuda ERROR: Failed to call Kokkos::Cuda::finalize()
Hope this is not a big issue as this happens after the calculation.
I also tried the current stable release (29 Sep 2021) compiled with same parameters mentioned above. But with it I am getting an error when I try to run the same calculation as above:
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)
Failing at address: (nil)
[ 0] /lib64/libpthread.so.0(+0xf370)[0x7f10c3d5b370]
[ 1] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x610a4b]
[ 2] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x85d115]
[ 3] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x6c6e0b]
[ 4] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x8acc61]
[ 5] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x52ce5c]
[ 6] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x46e0df]
[ 7] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x46e7db]
[ 8] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x411b18]
[ 9] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f10c3412b35]
[10] /home/rashid/GitProjects/lammps/build_kokkos/lmp[0x44ca03]
*** End of error message ***
As might have been clear from the above post, the error mentioned previously comes not with the stable release (29 Sep 2021), but with a pre-release version (patch_27Oct2021) and it was a mistake from my part to say so.
@rashidrafeek thanks for the update and the clarification. It is reassuring to see that the stable release is working as expected.
If you have the time, please also repeat the test with the “develop” branch. There was one bugfix applied to LAMMPS since the 27 October 2021 release that is related to ReaxFF and thus is likely the fix for the segmentation fault you have observed, but it would be nice to have the confirmation. Thanks.