[lammps-users] How to reduce the MSD fluctuation?

Hi lammps users,

Attached is one of my calculated MSD vs time curve of a simple NaCl-water solution (1000 water molecule and 8 Na+/Cl-) after 500 ps running at room temperature and 200 MPa.
I'm wondering how I can reduce the fluctuation?


I stable the system at 295 K and 1 atm,
then increase the pressure to 1974 atm (which is 200 MPa),
finally stable it at this condition for 500 ps.

My MD process code follows:

# msd compute

group Na type 3
group Cl type 4
compute namsd Na msd
compute clmsd Cl msd

# outputs

thermo 100
thermo_style custom step temp pe press density vol c_namsd[4] c_clmsd[4]
dump 1 all custom 2000 traj.lammpstrj id mol type x y z ix iy iz

# MD

timestep 1.0
fix 1 all nvt temp 295.0 295.0 100
run 100000
unfix 1

timestep 1.0
fix 1 all npt temp 295.0 295.0 100 iso 1.0 1.0 100
run 100000
unfix 1

timestep 1.0
fix 1 all npt temp 295.0 295.0 100 iso 1.0 1974.0 100
run 197400
unfix 1

timestep 1.0
fix 1 all npt temp 295.0 295.0 100 iso 1974.0 1974.0 100
run 500000
unfix 1

anyone have some idea? Should I increase the run time, or change the timestep / dump ?
Thank you guys in advance for any suggestions.

Yilong Pan
Western University


There are multiple issues here that you need to worry about.

- you should start accumulation and computation of the MSD only *after*
your system is properly equilibrated. time spent in preparing and
equilibrating the system must not be included.
- your last run starts right after you have completed your pressure ramp.
that means the system is *not* yet equilibrated at the final conditions.
you have to monitor a running average over a sufficiently long time of the
volume and/or pressure carefully to determine whether it has converged.
after you have a converged volume, you may consider running with fixed
volume at that average value. that may be more efficient and makes it
easier to combine the results in a consistent way.
- you may do the fit to get the self-diffusion coefficient only from the
last section of the MSD graph. The Einstein relation only holds for the
long time limit, so for shorter times the curve is not supposed to be linear
- since you do a high pressure and high concentration electrolyte, you have
to expect that everything is slowed down significantly. so a data
accumulation period of 500ps may not be enough to get properly converged
results. Unfortunately, the longer time period you need to get a MSD curve
that follows the Einstein relation, the more data you need to average over
to get converged results
- since you are looking at only the ions, your statistics are quite bad, so
you need a *lot* more simulation data compared to computing self-diffusion
of the solvent.
- for systems in equilibrium, there are always two ways to improve
statistics and thus the convergence of statistical properties: 1) simulate
a larger system and 2) average over multiple data accumulations.

Option 1) is preferred when you have to worry about finite size effects
(self-diffusion is quite sensitive to the system size and under your
conditions it may be even more so than for bulk water under normal
conditions for which studies about such effects exist) but otherwise
computationally more expensive, if you are using long-range electrostatics.
option 2) is more efficient and has the additional benefit of being
conveniently parallel: you can just take the final restart from your
existing simulation (which you hopefully have kept), reinitialize the
velocities with different random seeds and re-equilibrate and decorrelate
for some time (say 200-500ps) and then you can run multiple simulations
concurrently and then average over the results. On top of that each of
these simulations may contain multiple runs that reset/accumulate a new MSD
and since those are already decorrelated from their initial geometry, you
can just start a new accumulation without having to reinitialize velocities
and re-equilibrate. So you can use multiple concurrent runs starting from
decorrelated initial geometries to collect data and also have each of those
collect multiple sets and then average over all of them until you get
reasonably well converged results.