Segmentation fault using Granular pair style

Hello everyone,

I’m running LAMMPS version 21 Nov 2023 on Ubuntu 22.04. I’m compressing a system using NPT and holding the system at a specific pressure during NPT. However, my simulation crashes every time between unfixing and fixing NPT on this system.

I’m writing over here for help analysing the stack trace from the debugger. It seems to be an issue with atom type indices and the transfer_history function of the granular FF. I would like more insight into this stack trace.

Thread 1 "lmp_omp1" received signal SIGSEGV, Segmentation fault.
0x00005555563239f3 in LAMMPS_NS::PairGranular::transfer_history (this=0x555559b127a0, source=0x55555b2b6060, target=0x55555a22a3a0, itype=<optimized out>, jtype=2) at ../pair_granular.cpp:836
836       class GranularModel* model = models_list[types_indices[itype][jtype]];
(gdb) where
#0  0x00005555563239f3 in LAMMPS_NS::PairGranular::transfer_history (this=0x555559b127a0, source=0x55555b2b6060, 
    target=0x55555a22a3a0, itype=<optimized out>, jtype=2) at ../pair_granular.cpp:836
#1  0x0000555555d7a087 in LAMMPS_NS::FixNeighHistory::pre_exchange_newton (this=0x555559b4fe10)
    at ../fix_neigh_history.cpp:435
#2  0x0000555555d78df0 in LAMMPS_NS::FixNeighHistory::pre_exchange (this=0x555559b4fe10)
    at ../fix_neigh_history.cpp:232
#3  LAMMPS_NS::FixNeighHistory::pre_exchange (this=0x555559b4fe10) at ../fix_neigh_history.cpp:227
#4  LAMMPS_NS::FixNeighHistory::write_restart (this=0x555559b4fe10, fp=0x555559b27470)
    at ../fix_neigh_history.cpp:872
#5  0x000055555585dc9c in LAMMPS_NS::Modify::write_restart (this=0x5555597f9ec0, fp=0x555559b27470)
    at ../modify.cpp:1467
#6  0x00005555559da916 in LAMMPS_NS::WriteRestart::write (this=0x5555599a5a30, file="Post_compress.restart")
    at ../write_restart.cpp:241
#7  0x00005555559dbabb in LAMMPS_NS::WriteRestart::command (this=0x5555599a5a30, narg=1, arg=0x555559b412d0)
    at ../write_restart.cpp:113
#8  0x00005555557faf4c in LAMMPS_NS::Input::execute_command (this=0x555559705fb0) at ../input.cpp:868
#9  0x00005555557fb927 in LAMMPS_NS::Input::file (this=0x555559705fb0) at ../input.cpp:313
#10 0x00005555557e9721 in main (argc=<optimized out>, argv=<optimized out>) at ../main.cpp:77

Any insight would be appreciated, thanks in advance!

There is not more that can be said. The stack trace tells you exactly which command fails and the line in the source code. So you need to inspect the variables and arrays accessed on that line and determine from the source code and your settings whether it works as expected.

For any advice beyond that you need to provide a suitable simple test case, information about how you configured and compiled LAMMPS, what platform you are running it on and with which command line you start it.

This is a completely different issue.

This is too large a box/system for easy debugging.

Let’s not pile additional issues on top of the existing issue but rather try to have as few “complications” and address them one at a time.

You may want to consider using the “overlap” keyword to create_atoms to avoid close contacts right away rather than depending on the minimization to resolve them.

These settings are just crazy and make no sense at all. Why not stick with the (conservative) defaults first and see if additional changes would be needed or helpful later?

Does it run with “bin” for you? It does for me.
Do you see the code spends excessive amounts of time in the neighbor list builds? I don’t. And that makes perfect sense because your “small” particles are only 10% of your total system so the total speedup from optimizing the neighbor list build would be rather small. Have a look at Amdah’s Law about how much speedup you can theoretically achieve when you can improve only a small part of your calculation.

In summary, before looking at the details of crashes, you first need to build an input deck that works and gives meaningful results. There is no point in debugging something that is just a hodge-podge of some meaningful input mixed with settings that make no sense or are options to improve performance.
But before it is worth testing and debugging you need to provide a solid baseline.

Hello @akohlmey , I’ve been experimenting with the above simplified script and my original script.
I’ve made the box smaller while also introducing the overlap keyword, to decrease the burden of minimization.

You’re right about the neigh_modify. I’d progressively increased the one and page sizes since my simulations always crashed with the error “ERROR on proc 11: Neighbor history overflow, boost neigh_modify one (…/fix_neigh_history_omp.cpp:277)”. However after further testing it seems that the crash occurs only when I try to write a restart file using the write_restart command.
I’m still working on reproducing this issue with my simplified system to see how this can be narrowed down further.

I was trying to use multi since it was suggested for systems with significant difference in particle sizes. I though these sizes were significantly different, it seems that I was wrong. However, I see that a significant chunk of my time is spent in communication (about 25% using 8 MPI tasks). This seems unnaturally high for a system with 40000 particles. Do you have any recommendations on any approach I can use to improve my scaling efficiency?

Meanwhile, I’ll continue working on simplifying this issue further and post a final script once I narrow it down.

Thank you for taking the time to analyze my script

You didn’t pay attention to what I wrote. They are significantly different, but “multi” neighbor lists are only significantly beneficial if the smaller kind of particle is in a significant majority. Please study the most recent LAMMPS paper to learn about neighbor list stencils and so on and study the LAMMPS manual. You are simplifying the situation too much. Yes, you could speed up the search for neighbors of the small particles a bit, but since those are only 10% of your system, there is little potential for overall speedup. As I mentioned before, have a look at Amdahl’s Law. While this covers speedup through parallelization, it also applies to speedup through algorithmic improvements. And Amdahl’s law doesn’t consider overhead. The multi neighbor list style has additional overhead over bin. So it is not even obvious whether there would be an improvement.

If this happens over time, then chances are, there is something wrong with your model and the system is collapsing. You can confirm this easily with visualization.

My recommendation is to worry about correctness first and performance later. You are obviously infected by a sickness that is called “premature optimization™”.