Coul MSM - Segmentation Fault

Hi all,

My LAMMPS version is 21Oct2021

I’m trying to simulate a free surface system with shrink-wrapped (p p s) coordinates. I have output data from a fixed boundary simulation (p p f), so at my initial system, the atoms at z axis are located between the boundaries already.

“boundary p p f
pair_style lj/cut/coul/msm 14
.
.
read_data
change_box all boundary p p s”

results in a segmentation fault at changing box step.
"
Changing box …
Segmentation fault (core dumped)
"
The system starts and runs fine at “p p f” or “p p p” settings, so I doubt there is a dynamics problem. I’m doing this trial on a single CPU, to eliminate some other possibilities.

Any advice would be appreciated.

under these conditions it is usually better to use “p p m” boundaries instead of “p p s” as the latter can result in some unexpected shrinkage.

The following minimal example runs without a problem, though:

units real
atom_style charge
boundary p p f
region box block -5 5 -5 5 -5 5
create_box 1 box
create_atoms 1 single  0.0 0.0 -2.5
create_atoms 1 single  0.0 0.0 2.5
set atom 1 charge -1.0
set atom 2 charge  1.0
mass 1 1.0

pair_style lj/cut/coul/msm 10.0
kspace_style msm 1.0e-6
pair_coeff 1 1 0.1 3.0
run 0 post no

change_box all boundary p p s
run 0 post no

without a minimal reproducer input or a stack trace (11.3. Debugging crashes — LAMMPS documentation) there is little else that can be done from remote.

Thank you for your comments. It is my bad that I did not provide any files, sorry for that.

Sadly, I don’t have sudo privileges in the systems I use and can’t check the dump files. The smallest representative system for the error is attached. I will try to get my hands on a system that I can check dump files.

out.txt (952 Bytes)
test.data (1.5 MB)
test.input (391 Bytes)

You don’t need system privileges to run valgrind.
And you don’t have to evaluate core dumps after the fact; you can just run your usual LAMMPS command prerfixed with gdb --args and then type “run” at the gdb prompt. It will catch the segfault and “where” will give you a stack trace.

Oh, and on my Fedora 35 system, I don’t need to use sudo anyway: coredumpctl debug automatically pulls the last coredump and drops me into a gdb session debugging it.

I mentioned root privileges cause the systems neither have, nor allow me to install those (gdb, coredumpctl) + “ulimit -c” is set to 0. But I guess this is a correct time to stop postponing adding ubuntu as a dual boot to my laptop.

The problem is basically in your input itself. You are trying to change the box before it was initialized.
LAMMPS should catch that, but it also makes no sense. Why not use boundary p p m (or even p p
s?) straight from the beginning?

Please also note the extremely low accuracy of your MSM calculation.

Without root access you cannot install gdb or valgrind from a package repository, but you can still install them by compiling from source. It is not too difficult, especially gdb. It just takes a little time. But the gdb information can make up for the time invested quickly.

Please note that ulimit -c 0 only suppresses the generation of core files, you can still catch signals within gdb. Example (with your unmodified input):

$ ulimit -c 0
[[email protected] build-test]$ gdb --args ./lmp -in test.input 
GNU gdb (GDB) Fedora 11.1-5.fc35

[...]

Reading symbols from ./lmp...
(gdb) run
Starting program: /home/akohlmey/compile/lammps/build-test/lmp -in test.input
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
LAMMPS (7 Jan 2022)

[...]

  special bonds CPU = 0.005 seconds
  read_data CPU = 0.094 seconds
Changing box ...

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4304fba in LAMMPS_NS::MSM::setup (this=0x4bfbf0) at /home/akohlmey/compile/lammps/src/KSPACE/msm.cpp:332
332	  nxhi_direct = static_cast<int> (2.0*ax*delxinv[0]);
(gdb) print delxinv
$1 = (double *) 0x0
(gdb) where
#0  0x00007ffff4304fba in LAMMPS_NS::MSM::setup (this=0x4bfbf0) at /home/akohlmey/compile/lammps/src/KSPACE/msm.cpp:332
#1  0x00007ffff3d0b191 in LAMMPS_NS::Domain::reset_box (this=0x49a3f0) at /home/akohlmey/compile/lammps/src/domain.cpp:506
#2  0x00007ffff3c7acca in LAMMPS_NS::ChangeBox::command (this=<optimized out>, narg=<optimized out>, arg=<optimized out>) at /home/akohlmey/compile/lammps/src/change_box.cpp:355
#3  0x00007ffff3e4eb94 in LAMMPS_NS::Input::execute_command (this=0x47e070) at /home/akohlmey/compile/lammps/src/input.cpp:791
#4  0x00007ffff3e4f107 in LAMMPS_NS::Input::file (this=0x47e070) at /home/akohlmey/compile/lammps/src/input.cpp:270
#5  0x0000000000401298 in main (argc=<optimized out>, argv=<optimized out>) at /home/akohlmey/compile/lammps/src/main.cpp:76

Oh, I was following a previous advice from you BUT I was apparently a fool to forget adding “run 0” commant there.
"
it is still possible to use p p s boundaries, though.
you just need to read the data file with p p f
then issue a “run 0” command and then you can use “change_box all boundary
p p s” to switch and have the box size adjusted to shrinkwrapped dimensions.
"

https://sourceforge.net/p/lammps/mailman/lammps-users/thread/CAJh00bm5cNh5SYk%3DDVNWwEDLdAAwLcGpH3102VvbZFsrpa1cbQ%40mail.gmail.com/#msg37101561

After adding the “run 0” there it works like a charm. Thank you for your help, I did not know I could use debugging tools without root, so that came as a much appreciated side benefit.

Marcel

The side benefits are mutual. With the test input I created for reproducing your issue, I found and plugged a bunch of (previously rather elusive) memory leaks in the MSM class (I knew about them, but never came across a good way to trigger them without side effects and thus no simple way to plug them) and also updated the change_box command to enforce a system initialization if it is used before that has happened.