[lammps-users] Simulation killed after several thousand timesteps

Hello,

I am running a simulation using several custom fixes and pair styles. The simulation runs successfully for about ten-thousand-something timesteps, but then ends abruptly. I am running it in Ubuntu, and I just see the message “Killed”. I otherwise had no issues in compiling the code and making sure custom functions/arrays were working using printouts.

I am trying to understand why the “Killed” behavior is happening. I understand this is a very broad question, but my aim is to know which broad direction to go in to start troubleshooting. I was wondering if this kind of behavior is already known to occur due to a general aspect of the code (memory management, MPI communication, variables, class structure, etc). Any pointers are greatly appreciated.

Thanks,
Anne

This is difficult to advise on without more information. You should check the system log (via journalctl) or the kernel message buffer (via dmesg) for any hints pertaining to what the reason for killing the process was. It could also be that you were exhausting some limit set by ulimit, for example stack memory or open files or CPU time.

Is this with a regular or modified LAMMPS version (and what version)? Serial or parallel?
Axel

Please note that for debugging memory (and other) issues there are compiler instrumentations that you can enable (most easily when compiling with CMake): https://docs.lammps.org/Build_development.html#address-undefined-behavior-and-thread-sanitizer-support
You can also run LAMMPS with valgrind’s memcheck tool to detect (much slower but much more detailed than the sanitizer tools) memory issues and leaks: https://docs.lammps.org/Errors_debug.html#using-valgrind-to-get-a-stack-trace for which we have recommended command lines and suppressions for known false positives: https://github.com/lammps/lammps/tree/master/tools/valgrind

Axel.

Great; thank you for the information.

Anne

Hi Axel,

Thanks for the journalctl and dmesg suggestions. I looked into those options and it looked like a memory issue.

In case this helps others:

I have a custom post_force fix and I had initialized dynamic per-atom double vectors in the private: section in the header file. Initially, these vectors were allocated according to the number of atoms. On timesteps where the number of atoms would change, the fix would delete and re-allocate these vectors. I would then perform an extract() into these vectors from a custom pair style. I have now instead initialized the double vectors directly in the fix (post_force) without allocation before applying extract(), and it looks like the problem no longer occurs.

I’m using a modified version of 7 Aug 2019. It is compiled for MPI, but running on 1 processor. I’m aware it’s a bit older, but I believe the memory problem was primarily caused by my error in mixing allocation and extract() as described above.

Thanks again for your help.

Anne