comm style tiled segmentation fault

Greetings,

running on the latest stable version (Aug 22 2018) with comm style tiled seems to give me a segfault using this input. It may be because of the MPI implementation on this supercomputer so I’m wondering if it can be reproduced by someone else or not. Seems to work fine if I comment out the tiled comm style and the fix balance. The lammps build used to generate this was built with the “mpi” make file.

incrystal2.dm (359 Bytes)

Silicon_Crystal.txt (670 KB)

Si.sw (653 Bytes)

How many (or rather how few) MPI ranks are needed to reproduce the crash reliably?

Does the issue persist with the latest patch version (27 November 2018)?

Axel

Tried as low as 8 ranks so far and it segfaults with the Nov 27 development version.

ok. even with 1 MPI rank. however, you are not making it easy for
LAMMPS. part of your atoms are *outside* the box and shrinkwrap boxes
are tricky to handle for tiled communication under these
circumstances.

so you can simply work around this, by first using normal
communication, do a 'run 0' to make LAMMPS handle all shrinkwrap box
adjustments this way and then switch later. e.g. use this input:

units metal

dimension 3
boundary s s s
atom_style atomic
read_data Silicon_Crystal.txt
     pair_style sw

Yea sorry about that, I actually noticed that and fixed it but it didn’t change the segfault. I just put a negative sign on the lower bounds as was the original design intent.

Silicon_Crystal.txt (670 KB)

Yea sorry about that, I actually noticed that and fixed it but it didn't change the segfault. I just put a negative sign on the lower bounds as was the original design intent.

at any rate, for as long as you put the 'comm_style tiled' command
*after* read_data, it should not crash.

axel.

p.s.: i am very puzzled, though, why you would want to do load
balancing with recursive bisectioning on this kind of system.

Yea that’s a good workaround, thanks. It was just a test to shed light for a different more complicated setup for multi-scale code since what I was seeing in the DDT debugger didn’t add up; it may be that the two segfaults are completely unrelated however.