Secondary run failure when calculating clustering

Hi,

I’m trying to calculate solid cluster sizes and i’ve come up against a problem.

I’m using a compute reduce to get the largest cluster number (I know this is just the number of the atom first identified in the cluster but it forces the clustering to update). When I have this in my thermo style one run command will work fine but if I try to do a second it’ll only work if the first was for less than 10 steps. If the first run was for 10 or more steps lamps hangs after printing the thermo style headings.

This problem appears to go away at small system sizes but I haven’t narrowed down exactly what the limit is.

Have I missed something about clustering in lamps which is causing this?

I’ve simplified my script to hopefully focus on the issue
(in my actual script I’m using q6 and coordination number to determine solid particles etc)

log test.log
units lj
atom_style atomic
atom_modify map hash
lattice fcc 1.0 spacing 1 1 1

region box block 0 9 0 2 0 2 units lattice

doesn’t work at 10 x 2 x 2 - 160 atoms

works at 9 x 2 x 2 - 144 atoms

create_box 1 box
create_atoms 1 box

mass 1 1.0
pair_style lj/cut 3.5
pair_modify tail yes
pair_coeff 1 1 1.0 1.0

do clustering on all atoms

compute cluster all cluster/atom 1.5

get largest cluster number

compute max all reduce max c_cluster

thermo_style custom step c_max
thermo 1
fix 1 all nph iso 0.02 0.02 1
fix 2 all langevin 1.5 1.5 0.1 1

run 1

run 10

run 2

run 10

I’m running the 23 Oct 2017 version on lammps
This happens on macOS 10.12.6, lammps complied with gcc 7.2.0.0 (from mac ports)
and on SUSE Enterprise 11.4 lammps compiled with gcc 6.3 or icc 17.0.2

Here’s the output i get when it’s not working:

LAMMPS (23 Oct 2017)
Lattice spacing in x,y,z = 1.5874 1.5874 1.5874
Created orthogonal box = (0 0 0) to (15.874 3.1748 3.1748)
1 by 1 by 1 MPI processor grid
Created 160 atoms
Time spent = 0.000475883 secs
Neighbor list info …
update every 1 steps, delay 10 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 3.8
ghost atom cutoff = 3.8
binsize = 1.9, bins = 9 2 2
2 neighbor lists, perpetual/occasional/extra = 1 1 0
(1) pair lj/cut, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/3d/newton
bin: standard
(2) compute cluster/atom, occasional
attributes: full, newton on
pair build: full/bin/atomonly
stencil: full/bin/3d
bin: standard
Setting up Verlet run …
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 4.325 | 4.325 | 4.325 Mbytes
Step c_max
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
Loop time of 0.00756097 on 1 procs for 10 steps with 160 atoms

Performance: 571355.384 tau/day, 1322.582 timesteps/s
99.6% CPU use with 1 MPI tasks x no OpenMP threads

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

just a quick note. this is probably related to the issue described here:

https://sourceforge.net/p/lammps/mailman/message/36105303/

both computes use the same communication patterns and logic to iterate the information across MPI ranks.
i can reproduce it and have some traces, but have not yet understood what exactly is causing the deadlock.
this is quite complex and i am currently extremely busy with other tasks.

axel.

Hello Craig and Axel,

I think that I have diagnosed the problem here: https://github.com/lammps/lammps/pull/728. If I’m correct, you should be able to avoid this problem for now by running with “pre no” on the second run, or ensuring that the neighbor list isn’t built on the last step of the first run (making it a non-multiple of 10).

-David