Segmentation Fault with GPU and lj/class2/coul/long

All,

I’m having a strange issue with a segmentation fault (signal 11) when running with pair_style lj/class2/coul/long and kspace pppm on a GPU. The segmentation fault disappears when I run the same simulation script with pair_style lj/class2/coul/cut instead of lj/class2/coul/long. I tested a completely different system that uses the pair style lj/charmm/coul/long and kspace pppm with the GPU and that one works. There IS a gpu variant of lj/class2/coul/long so it should work. From my Googling, signal 11 usually means that a “pointer pointed to a location in memory outside of the program’s space”. I’m not familiar with C++, but why would this happen for one pair style and not another? These simulations run fine with the CPU only.

Any tips or suggestions would be much appreciated!

I’m using the latest unstable branch of LAMMPS, by the way. The GPU library was built with make lib-gpu args="-m mpi -a sm_75 -b". I included the following packages: class2, compress, gpu, kspace, manybody, mc, molecule, qeq, rigid, user-misc, and user-reaxc. I’m using Linux Mint 19.3 (18.04 Ubuntu).

The segmentation fault occurs where the zero thermo output should be. I’ve pasted some of the output below.

fix 1 all bond/react stabilization yes statted_grp .03 react rxn1 all 100 0 7 pre post map_file_alt.txt prob 0.001 42552 stabilize_steps 500 update_edges charges
dynamic group bond_react_MASTER_group defined
dynamic group statted_grp_REACT defined
fix 2 statted_grp_REACT nvt temp 300 300 100
fix 3 bond_react_MASTER_group temp/rescale 1 300 300 10 1

thermo_style custom step temp press etotal ke pe epair ebond eangle edihed eimp elong lx ly lz pxx pyy pzz vol density v_sxx v_syy v_szz f_sxx_ave f_syy_ave f_szz_ave c_cnt_nylon f_poly_cnt f_1[1]
run 600000 # 600 ps
PPPM initialization …
using 12-bit tables for long-range coulomb (…/kspace.cpp:323)
G vector (1/distance) = 0.304537
grid = 90 80 75
stencil order = 5
estimated absolute RMS force accuracy = 0.000379821
estimated relative force accuracy = 1.14382e-06
using double precision KISS FFT
3d grid and FFT values/proc = 370968 273600

Please provide the exact LAMMPS version string instead of “latest”, since “latest” is a moving target.

This sounds like a manifestation of a bug that was already fixed.

Axel.

Axel,

I pulled the unstable branch today so it should be the 4 Feb 2020 version. LAMMPS also reports 4 Feb 2020 when it’s run. Should I try the master branch?

Thanks,
Will

Version 4 Feb 2020 should have the bugfix I was thinking about.
I made some simple checks and it doesn’t look like lj/class2/coul/long/gpu as such is an issue here.
so there has to be some bad interaction between using a GPU pair style and some of the other features that you are using.

can you please try to do two things:

  1. reduce the system size to check if this makes a difference
  2. remove all additional features and output so that you are left with the minimal input deck doing a plain MD, if that works without a crash, try to add what you have removed individually and thus identify what is causing the segfault.

if you have narrowed down the source of the issue, please post the complete input deck for this minimal test case and i will take a look and try to track down the cause.

axel.

Thanks! Reducing the system size did not make a difference. However, removing all additional features did result in a simulation that ran with the GPU. I have the segfault narrowed down to compute group/group. I have attached the complete input deck as well as the data file it reads in.

Thanks!
Will

N1124IFFDfy64.dat (432 KB)

N0214IFFEq64.in (1.41 KB)

thanks.

that helped a lot and i’ve identified the source of the issue.

you can avoid it, but adding:

pair_modify table 0

please note, that compute group/group will not run on the GPU, but on the CPU and thus can slow down your simulation quite considerably and negate the impact of GPU acceleration.

Axel.

pair_modify table 0 works great! Thank you very much! I appreciate the help. I’ll avoid invoking compute group/group too often when running with the GPU.

Thanks again,
Will

pair_modify table 0 works great! Thank you very much! I appreciate the help.

no problem. it helps to have a good test example to easily reproduce the issue.

a proper fix will be in the next LAMMPS patch that will avoid the segfault without the workaround.

Axel.