MD simulation hangs while operating on multiple nodes (2* 80 cpu), but not on single node (80 cpu)

Dear LAMMPS community,
I have been simulating compression of Fe nanoparticle on cluster with meamc package with latest version of lmp_mpi executable installed using “make”. I have been running MD simulations successfully on a single node with 80 cpu. But when i try to do the same with multiple nodes(in this case 2 nodes with 80 cpu, in total 160 cpu), Md simulation energy minimizes successfully and hangs while thermal equilibration after running on 9000 timesteps. I have repeated this simulation with different versions of lmp executables and different input script files. I found that while using multiple nodes alone, I face this issue. I found from the common problems listed in lammps homepage stating that this issue may arise due to memory allocations. But, my simulation cell has only 61993 atoms. So, kindly let me know, is it still memory allocation the issue or else something else and how to tackle this issue. The out.log file is attached for your reference.

LAMMPS (3 Mar 2020)
Reading data file …
orthogonal box = (0 0 0) to (105.295 105.295 105.955)
4 by 5 by 8 MPI processor grid
reading atoms …
61993 atoms
read_data CPU = 0.129565 secs
WARNING: Using ‘neigh_modify every 1 delay 0 check yes’ setting during minimization (…/min.cpp:190)
Neighbor list info …
update every 1 steps, delay 0 steps, check yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 3.9
ghost atom cutoff = 3.9
binsize = 1.95, bins = 55 55 55
3 neighbor lists, perpetual/occasional/extra = 2 1 0
(1) pair meam/c, perpetual
attributes: full, newton on
pair build: full/bin/atomonly
stencil: full/bin/3d
bin: standard
(2) pair meam/c, perpetual, half/full from (1)
attributes: half, newton on
pair build: halffull/newton
stencil: none
bin: none
(3) compute centro/atom, occasional, copy from (1)
attributes: full, newton on
pair build: copy
stencil: none
bin: none
Setting up cg style minimization …
Unit style : metal
Current step : 0
Per MPI rank memory allocation (min/avg/max) = 9.58 | 9.947 | 10.13 Mbytes
Step Temp E_pair E_mol TotEng Press Volume
0 0 -259612.51 0 -259612.51 -5600.6637 1175444.5
181 0 -259798.7 0 -259798.7 -0.0015051703 1172999.9
Loop time of 2.62417 on 160 procs for 181 steps with 61993 atoms

98.8% CPU use with 160 MPI tasks x no OpenMP threads

Minimization stats:
Stopping criterion = energy tolerance
Energy initial, next-to-last, final =
-259612.514588 -259798.695841 -259798.695841
Force two-norm initial, final = 38.4147 9.41444e-06
Force max component initial, final = 0.698271 4.96741e-07
Final line search alpha, max atom move = 1 4.96741e-07
Iterations, force evaluations = 181 361

MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total

Pair | 1.0241 | 1.8352 | 2.5424 | 34.5 | 69.93
Neigh | 2.1935e-05 | 0.0021836 | 0.0040772 | 2.8 | 0.08
Comm | 0.0036802 | 0.58658 | 1.2842 | 52.7 | 22.35
Output | 0 | 0 | 0 | 0.0 | 0.00
Modify | 0 | 0 | 0 | 0.0 | 0.00
Other | | 0.2002 | | | 7.63

Nlocal: 387.456 ave 731 max 0 min
Histogram: 28 8 16 4 12 8 16 16 36 16
Nghost: 604.519 ave 1182 max 1 min
Histogram: 12 20 12 16 16 8 32 12 12 20
Neighs: 2619.25 ave 5092 max 0 min
Histogram: 32 12 8 11 5 14 14 12 39 13
FullNghs: 5238.5 ave 10234 max 0 min
Histogram: 32 12 8 16 0 16 12 12 44 8

Total # of neighbors = 838160
Ave neighs/atom = 13.5202
Neighbor list builds = 4
Dangerous builds = 0
Eqilibration started
Setting up Verlet run …
Unit style : metal
Current step : 181
Time step : 0.002
Per MPI rank memory allocation (min/avg/max) = 8.455 | 8.827 | 9.005 Mbytes
Step Lx Ly Lz Pxx Pyy Pzz Temp
181 105.21112 105.21112 105.89648 4384.0353 4375.1527 4383.5824 600
1000 105.62167 106.33341 106.92171 259.53301 -2651.7204 -2201.5771 300.07451
2000 105.8271 106.18737 107.10834 -4158.717 -4895.0946 -5071.8849 302.5528
3000 105.67708 106.38152 107.07854 -2947.466 -4338.2213 -3929.2514 300.57732
4000 105.51068 106.01229 107.1277 -722.23834 -1377.8135 -2228.3651 299.94536
5000 105.5777 106.09702 106.8721 1032.3115 425.9865 633.57173 299.71821
6000 105.77597 106.18139 107.08216 1446.6004 1047.6276 641.0847 301.52681
7000 105.66961 106.35977 107.00775 2388.5032 1378.7901 1122.5734 300.02605
8000 105.68289 106.22016 106.75727 4097.807 3053.5372 3352.3676 300.03533
9000 105.63295 106.03107 106.95652 3727.5894 3796.1332 2599.7099 301.90837

System hangs exactly at 9000 timesteps

Your output says that you have LAMMPS version 3 Mar 2020. This is by far not the latest release.

Using 160 MPI processes for a system with only 62000 atoms is overkill. It will likely be slower than using fewer processors.

But regardless of that, the first order of business is to determine whether:

  • this is an issue specific to your input
  • this is an issue specific to using multiple nodes on your cluster
  • this is an issue specific to older versions of LAMMPS

Thus, please do the following:

  • run the “in.lj” in the LAMMPS “bench” folder with the same settings, if possible also with the “in.rhodo” input
  • run your input with fewer processors per node (e.g. 40 instead of 80)
  • compile and test the latest release of LAMMPS (22 December 2022) and run with that
  • remove all superfluous computation and output commands from your input
  • report the output of lmp_mpi -h