Hello Lammps users,
I study the statics and dynamics properties of one or several polymer(s) in various geometries (free space, slit, and cylinder) at different volume fractions of Lennard-Jones (LJ) particles (crowders) at a reduced temperature equivalent to T=300K. To span a range of volume fraction from 0 to 0.45 at a fixed set of other parameters, I change the number of crowders between 0 to ~250000. I set r_cut=1.6sigma (where sigma=1) so I have purely repulsive Lennard-Jones (WCA) interaction among all the particles (monomers and crowders). Polymerization number N is in the range of 50-3000. A polymer chain can be heterogenous, so its monomers can have size between a_mon=1sigma and 10sigma (so we have different r_cut for different monomers). All the crowders creating the fluid have the size sigma=1. A typical system is run for 10^6 timesteps for equilibration, and then it runs for 10^8 for data sampling. The Langevin thermostat is used to run simulation with fix NVE (see attached unit file and log file a test simulation).
I used this setup for building of neighbour lists in the two following test runs:
neighbor 0.2 bin
neigh_modify delay 0 every 1 check yes one 300 page 3000
I got this summary for a chain of length N=2000 with a_mon=sigma and 39000 crowders of size a_crd=sigma in a cylindrical confinement (PBC along the axis of symmetry):
Run name: N2000epsilon5.0r10.5lz100.0sig1.0nc39000ens1
Loop time of 2012.52 on 32 procs for 5000000 steps with 41900 atoms
Performance: 1073283.905 tau/day, 2484.453 timesteps/s
86.2% CPU use with 32 MPI tasks x 1 OpenMP threadsMPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %totalPair | 129.02 | 134.35 | 141.52 | 33.9 | 6.68
Bond | 0.8848 | 7.9447 | 17.355 | 228.5 | 0.39
Neigh | 411.79 | 425.92 | 441.76 | 43.8 | 21.16
Comm | 469.04 | 525.47 | 585.83 | 144.9 | 26.11
Output | 14.212 | 16.203 | 23.934 | 43.7 | 0.81
Modify | 736.52 | 792.64 | 851.14 | 110.7 | 39.39
Other | | 110 | | | 5.46Nlocal: 1309.38 ave 1394 max 1246 min
Histogram: 2 5 4 7 0 4 4 3 2 1
Nghost: 565.594 ave 625 max 529 min
Histogram: 4 5 3 4 4 7 3 1 0 1
Neighs: 1125.06 ave 1227 max 1021 min
Histogram: 2 0 3 5 6 7 2 4 0 3Total # of neighbors = 36002
Ave neighs/atom = 0.859236
Ave special neighs/atom = 0.0954177
Neighbor list builds = 1114837
Dangerous builds = 0
System init for write_restart …
Total wall time: 5:46:28
Or, this one for a chain of length N=80 with a_mon=sigma and ~79000 crowders of size a_crd=0.2sigma in a cylindrical geometry (PBC along the axis of symmetry):
Run name: N80epsilon5.0r3.5lz29.25sig0.2nc78624ens1
Loop time of 21186.1 on 32 procs for 5000000 steps with 78704 atomsPerformance: 40781.384 tau/day, 236.003 timesteps/s
77.0% CPU use with 32 MPI tasks x 1 OpenMP threadsMPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %totalPair | 401.08 | 820.37 | 1599.5 |1301.8 | 3.87
Bond | 1.1507 | 2.2343 | 4.5161 | 71.9 | 0.01
Neigh | 13070 | 13233 | 13390 | 83.9 | 62.46
Comm | 3710.3 | 4199.8 | 4884.8 | 438.7 | 19.82
Output | 30.826 | 34.533 | 50.451 | 66.3 | 0.16
Modify | 1898.4 | 2544.4 | 3208.8 | 750.2 | 12.01
Other | | 352.2 | | | 1.66Nlocal: 2459.5 ave 3641 max 1398 min
Histogram: 16 0 0 0 0 0 0 1 6 9
Nghost: 7992.56 ave 10665 max 5447 min
Histogram: 8 8 0 0 0 0 0 8 0 8
Neighs: 17812.8 ave 28692 max 9476 min
Histogram: 16 0 0 0 0 0 1 4 7 4Total # of neighbors = 570010
Ave neighs/atom = 7.24245
Ave special neighs/atom = 0.00200752
Neighbor list builds = 481583
Dangerous builds = 0
Total wall time: 57:32:50
I changes the setup to the following one for the below simulation:
neighbor 0.2 bin
neigh_modify every 1 delay 0 check yes page 200000
Here, a chain of N=100 with a_mon=sigma and 5sigma and ~124000 crowders of size a_crd=1.0sigma in free space (a cube):
Run name: N100dl5.0nl2l30dc1.0nc123759dtbdump2000adump20000ens1
Loop time of 16734.9 on 32 procs for 5000000 steps with 123859 atomsPerformance: 51628.506 tau/day, 298.776 timesteps/s
99.8% CPU use with 32 MPI tasks x 1 OpenMP threadsMPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %totalPair | 962.92 | 989.79 | 1138.8 | 100.3 | 5.91
Bond | 0.9434 | 1.746 | 4.1282 | 76.4 | 0.01
Neigh | 9956.5 | 10111 | 10160 | 41.7 | 60.42
Comm | 3045.6 | 3314.9 | 3558 | 268.8 | 19.81
Output | 125.85 | 126.21 | 127.03 | 1.9 | 0.75
Modify | 1625.1 | 1854.3 | 2109.3 | 352.8 | 11.08
Other | | 336.8 | | | 2.01Nlocal: 3870.59 ave 3918 max 3828 min
Histogram: 2 4 3 3 5 5 1 6 1 2
Nghost: 13057.5 ave 13119 max 12978 min
Histogram: 1 2 3 0 4 4 8 6 3 1
Neighs: 9996.34 ave 10337 max 9764 min
Histogram: 4 3 4 5 5 2 5 3 0 1Total # of neighbors = 319883
Ave neighs/atom = 2.58264
Ave special neighs/atom = 0.00159859
Neighbor list builds = 498549
Dangerous builds = 0
Total wall time: 55:18:44
Since I used loop functionality of Lammps to split dump files to chunks in the 1st and 3rd simulations above, so the statistics for them is for 1 loop out of 10; the statistic for other loops is almost similar.
As you can see, more the ~50% of simulation time is used for communication and building of neighbour lists. How can I reduce these two times?
The lammps version is 2019-08-07.
Please find attached the log file and lammps input I used for the last simulation.
Thanks for your help,
Amir
N100dl5.0nl2l30dc1.0nc123759dtbdump2000adump20000ens4.log (467 KB)
N100dl5.0nl2l30dc1.0nc123759dtbdump2000adump20000ens1.data (7.62 KB)
N100dl5.0nl2l30dc1.0nc123759dtbdump2000adump20000ens1.lmp (4.57 KB)
N80epsilon5.0r3.5lz29.25sig0.2nc78624ens1.data (6.59 KB)