combination of fix_deposit and comm_style tiled gives segfault

Hello people,

I have a reproducible case where a combination of fix_deposit and comm_style tiled makes a simulation crash with a segmentation fault. Using each without the other works fine.

Using comm_style brick instead of comm_style tiled also gives (me, at least) near-perfect load balancing without any crashes, so it's not a practical problem. But if someone feels this should be fixed, the input/output files are at the url below.

It would not be a very clean case to debug, unfortunately. It's a 60 million atom system and getting to the crash point requires running on 60 Xeon cores for ~6 hours. That was the shortest, quickest run for which I found a reproducible crash, more often it takes a day or more on 60 cores to get to a crash point.

The system is a vey low density W structure, consisting of almost all open space with some ellipsoidal W objects obstructing the paths of He ions. The simulation was run with the 14 May 2016 version of lammps, which must be compiled with the MISC package for fix_deposit. The seed that is used must be set to 17859, as is done through the queue submitfile in this case.
In the in-file most of the W obstacles are defined as their own group and have their own separate thermostat. And since the obstacles have different sizes, some lines are dedicated to determining the ellipsoid obstacle number each atom is in.

http://dutsm1219.tudelft.net/crashfiles.tar
(1.9 GB archive, the datafile enddata-end-reduced.gz must be gunzipped)

Regards,
Peter

Can you try running up until the time the last load-balancing occurs,

then writing a restart file. Then restarting from that file

and seeing if the crash happens quickly?

Steve

Hi Steve,

The original crash run only did load balancing at the start, since the load balancing hardly changes during the run.

I tried writing a restart file just before the crash point and then seeing if I would still get the crash when restarting from that, but that didn't work.

It appears I can't produce a set of input files that way that require less cpu time to reach a reproducible crash point. Though if you have another suggestion on how to achieve that, I'll be happy to give it a try.

greets,
Peter