TIP4P not running in parallel computing for ice box with a vacuum region

Hello lammps users,

I try to use the TIP4P/ice model to simulate the ice with a big vacuum region in y direction, and it can be run successfully on my local desktop in parallel or serially. Also, it can be run on a HPC by using one core. But for parallel computing using many cores on the HPC, the CPU time is passing, and only very limited information is output to the log file. I also try some ways, for example:

(1) Using TIP3P model, it works fine on the HPC;

(2) Using the command “fix fixbalance all balance 100 1.2 shift y 100 1.2”, it still does not work on the HPC;

(3) Using “comm_style custom” and “fix 2 all balance 1000 1.1 rcb”, there is an error “PPPM can only currently be used with comm_style brick”. The version of lammps i used is 7 Aug 2019.

My input file is:

units real
boundary p p p
atom_style full
read_data ice.data

pair_style lj/cut/tip4p/long 2 1 1 1 0.1577 12.0
kspace_style pppm/tip4p 1.0e-5
pair_modify tail yes

pair_coeff 2 2 0.21084 3.1668
pair_coeff 1 2 0.0 0.0
pair_coeff 1 1 0.0 0.0
bond_style harmonic
bond_coeff 1 0.0 0.9572
angle_style harmonic
angle_coeff 1 0.0 104.52

neighbor 2.0 bin
neigh_modify every 1 delay 0 check yes

timestep 1.0
velocity all create 10 34455 dist gaussian mom yes rot yes
fix 1 all nvt temp 10 10 100
fix 4 all shake 0.0001 10 1000 b 1 a 1

#comm_style brick
#fix fixbalance all balance 100 1.2 shift y 100 1.2

#comm_style tiled
#fix 2 all balance 1000 1.1 rcb

thermo_style custom step temp pe ke etotal press
thermo 100
run 1000

All the info in the log file is

2 = max # of 1-2 neighbors

1 = max # of 1-3 neighbors

1 = max # of 1-4 neighbors

2 = max # of special neighbors

special bonds CPU = 0.0028851 secs

read_data CPU = 0.199092 secs

2400 atoms in group g_wallOut

39312 atoms in group water

24000 atoms in group outWall

System init for write_data …

PPPM initialization …

extracting TIP4P info from pair style

using 12-bit tables for long-range coulomb (…/kspace.cpp:323)

G vector (1/distance) = 0.347119

grid = 96 160 96

stencil order = 5

estimated absolute RMS force accuracy = 0.00017633

estimated relative force accuracy = 1.22454e-05

using single precision MKL FFT

3d grid and FFT values/proc = 102850 62208

Last active /omp style is kspace_style pppm/tip4p/omp

Neighbor list info …

update every 1 steps, delay 10 steps, check yes

max neighbors/atom: 2000, page size: 100000

master list distance cutoff = 12

ghost atom cutoff = 12

binsize = 6, bins = 17 35 17

1 neighbor lists, perpetual/occasional/extra = 1 0 0

(1) pair lj/cut/tip4p/long/omp, perpetual

attributes: half, newton on, omp

pair build: half/bin/newton/omp

stencil: half/bin/3d/newton

bin: standard

Finding SHAKE clusters …

0 = # of size 2 clusters

0 = # of size 3 clusters

0 = # of size 4 clusters

13104 = # of frozen angles

find clusters CPU = 0.002249 secs

PPPM initialization …

extracting TIP4P info from pair style

using 12-bit tables for long-range coulomb (…/kspace.cpp:323)

G vector (1/distance) = 0.347119

grid = 96 160 96

stencil order = 5

estimated absolute RMS force accuracy = 0.00017633

estimated relative force accuracy = 1.22454e-05

using single precision MKL FFT

3d grid and FFT values/proc = 102850 62208

Last active /omp style is kspace_style pppm/tip4p/omp

Setting up Verlet run …

Unit style : metal

Current step : 0

Time step : 0.001

I really appreciate your help.

Best wishes,

Qiangqiang Sun

most of the cases where a simulation is stalling like this, there is a bad initial geometry.
the fact that the tip3p version is running doesn’t mean, that the geometry is correct.
it would be helpful, if you would provide log information from a run that is not stalling.
this most certainly is not a load balancing issue, so any attempts in that regard are meaningless.

without seeing more details there is not much to suggest and recommend. there are three general suggestions that may or may not help to improve the situation:

  • try running your system with the latest version of LAMMPS
  • try running a minimization before starting MD. since minimization is not compatible with fix shake, you cannot define that fix but instead need to use large force constants for bond and angle potentials to preserve the shape of the water molecules.
  • after defining fix shake, it is often a good idea to do a “run 0” command so that some of the initial data is initialized more cleanly and the internal extrapolation is more accurate on the first MD step.

axel.

Hello Axel,

Thank you very much for your comments.

I tried the three ways you commented, and the problem is not solved.

I run successfully my ice Ih model using one core on the ARCHER HPC. But for 24 cores, the run is stalling. Corresponding log files are also attached.

PS: I expand my simulation box in y direction from [0.481, 30.799] to [0, 80] to develop a vacuum region. I am sorry that the data file for the ice Ih model is 1.2M, and cannot be attached.

Thank you a lot for any more comments.

Best wishes,

Qiangqiang

Axel Kohlmeyer <[email protected]> 于2020年7月21日周二 下午2:35写道:

1core_log.dat (6.71 KB)

24Cores.dat (2 KB)

you have a rather small system and using as many as 24 MPI ranks may increase the chance of having MPI ranks without any atoms. the fact that you say you have a large gap in your system makes that much more likely. MPI ranks without atoms have been known to occasionally expose bugs in a few parts of LAMMPS for some combination of features. one of these issues was discovered and fixed recently that can affect fix shake. You would need to use the very latest LAMMPS patch (version 21 July 2020, i.e. released today) to have the bugfix included.

you can confirm that the gap is causing the issue, try running with:

processors * 1 *

which will constrain the number of MPI ranks in y-direction to 1 (normally this will be determined by volume). this may be a good choice anyway and improve parallel efficiency without having to resolve to load balancing commands.

axel.