Segmentation fault while using KOKKOS

Dear LAMMPS Mailing List,

I am using the 5May2020 version of LAMMPS. My goal is to use the KOKKOS acceleration package along with hybrid pair_style. I recently posted a message here asking about an error with pair_style hybrid & kokkos, and the output of the discussion was that I should comment some line on the code, what I did, and I don’t have the error message I had previously. But, when running the script I now have some segmentation errors, with the adress not mapped signal code. I can run the script without the KOKKOS package, I can also run other scripts with the KOKKOS package (but not using the hybrid pair_style), but when I try to run that script on the GPU(s), I have this kind of messages:

[gpu001:457814] *** Process received signal ***
[gpu001:457814] Signal: Segmentation fault (11)
[gpu001:457814] Signal code: Address not mapped (1)
[gpu001:457814] Failing at address: 0x481
[gpu001:457814] [ 0] /usr/lib64/libpthread.so.0(+0xf5d0)[0x2aaaacafe5d0]
[gpu001:457814] [ 1] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x85bfcf]
[gpu001:457814] [ 2] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x8a2994]
[gpu001:457814] [ 3] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0xa46f02]
[gpu001:457814] [ 4] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x4cb529]
[gpu001:457814] [ 5] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x9ebea8]
[gpu001:457814] [ 6] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x45b236]
[gpu001:457814] [ 7] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x4e98c5]
[gpu001:457814] [ 8] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x455496]
[gpu001:457814] [ 9] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x453ab7]
[gpu001:457814] [10] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x4541c4]
[gpu001:457814] [11] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x409306]
[gpu001:457814] [12] /usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaad5c73d5]
[gpu001:457814] [13] /home/int/sam/defoort/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi[0x44c37f]
[gpu001:457814] *** End of error message ***

I know Segmentation faults issues are not easy to solve remotely, but since the script runs well without the KOKKOS package, my primary guess is that I fail while compiling my executable.
I am working on the High Performance Cluster of my university with some GPU nodes under the Skylake architecture ( 2 X NVidia V100 16Gb per nodes), and I used this command line to build my KOKKOS executable (as I need the BODY / MANYBODY / MOLECULE / KSPACE / RIGID / MISC / MC / MOLECULE / REAXC / OMP / KOKKOS packages):

cmake -DCMAKE_INSTALL_PREFIX=/home/int/sam/defoort/lammps/newinstalldir -DBUILD_OMP=yes -DKokkos_ARCH_SKX=yes -DLAMMPS_MACHINE=mpi -DPKG_BODY=yes -DPKG_MANYBODY=on -DPKG_MOLECULE=on -DPKG_KSPACE=on -DPKG_RIGID=on -DPKG_MISC=on -DPKG_MC=on -DPKG_MOLECULE=on -DPKG_USER-REAXC=on -DPKG_USER-OMP=on -DPKG_KOKKOS=on -DKokkos_ENABLE_CUDA=yes -DKokkos_ARCH_VOLTA70=ON -DCMAKE_CXX_COMPILER=wrapper -DKokkos_ENABLE_OPENMP=yes -DBUILD_OMP=yes -DCMAKE_CXX_COMPILER=/home/int/sam/defoort/lammps/lammps-patch_5May2020/lib/kokkos/bin/nvcc_wrapper …/cmake/

To run the script I am using this command:
PROGNAME=~/lammps/lammps-patch_5May2020/build-kokkos-pair-mods/lmp_mpi
OMP_PROC_BIND=true
OMP_PLACES=threads
mpirun -np 8 $PROGNAME -pk kokkos newton on neigh half comm no -k on g 2 -sf kk -in in.si.sputtering_init

Has anyone encountered such problems ? If so, what should be my approach on how to solve these segmentation faults ? Any advices are more than welcome. I can join my script if needed, but since the script is working without KOKKOS, I believe the script is fine and it is my KOKKOS compilation that is faulty.

Thank you very much,
Best regards,
Grégoire

12640146.gif

Gregoire,

there are multiple things that you can do to simplify identifying the cause of the segmentation fault and aide in debugging this:

  • compile/link with debug info included: that allows to get a meaningful stack trace to identify the exact location of the segfault
  • compile/link KOKKOS without OpenMP/GPU and/or without GPU: that allows to identify whether this is a general KOKKOS problem or something particular to a specific class or platform
  • compile a version with the latest development code from github: that ensures that you have up-to-date code with all bugfixes included
  • identify whether the segfault happens during the “setup” phase of a run or later during the MD (just do a “run 0”)
  • run a variant of your input with all optional fixes and computes removed (even time integration), to narrow down whether the issue is due to the force computation or the system manipulation or analysis
  • if the previous version does not crash, put the computes and fixes back one by one to narrow down which is triggering the issue

Axel.

12640146.gif

If you have a reproducer input script I would be happy to look at it. This could be a bug in the Kokkos package.

Stan

12640146.gif