Hi all
I would like to share some observations which I came across while investigating on restarting a simulation.
for simplicity, I have considered a system that containing atoms which are only interacting through LJ potential. for comparing results I am running 2 simulation one is a continuous actual simulation and other one is trying to recreate a part of actual simulation through restart file.
Actual run : NVT for 100000 timestep followed by NVT for 200000 timestep, here restart file is written at end of first NVT.
Restart simulation : only the second part i.e., NVT for 200000 is performed from the restart file written at the end of first NVT of first simulation.
when we compare last NVT 200000 of both simulation thermodynamic data are not exactly same , I will attach the plot below.
I have run another comparison with last part of both simulation considering to be NVE , but in that case I got exactly same result.
What conclusion can we make from this, is it because NVT has some randomization considerations in its algorithm.
I am attaching 2 plots and input script here. argon.txt (1.1 KB)
I cannot say for certain, since I would need to see the exact and complete input for all four runs that you are comparing and the corresponding log files, but it looks like you either somehow re-initialized fix nvt on the restart run or were having some other small difference in the settings. That will lead to the eventual exponential divergence.
Thank you, you were correct, in actual simulation I was using same fix ID for separate successive simulations(unfixing was done once one set is completed but for next set I used same fix ID)
this caused reinitializing of fix when I run restart simulation,
Then I tried by giving separate fix ID so that re initializing of earlier fix was prevented, but then for around 200000 steps both simulation were giving exact same result but after that, restart run started diverging,
I am attaching separate input file of both simulation with this, input files are very simple , it was only to check the restarting of simulation.
In a perfect world, those two graphs would be identical, but in real life there still can be differences. Having the two runs go identical for 200000 steps is pretty good, actually.
When you use a different number of processors, a different compiler, different compiler optimization settings, etc. the divergence would start (much) earlier. So the difference is likely somewhere a single bit at some expression where rounding is applied where the number is - by chance - stored differently. This is most likely due to compiler optimization where the compiler may have elected to skip exact conformance with the IEEE 754 floating point standard for extra speedup.
Since this kind of divergence grows exponentially. The two simulations will quickly decorrelate once there is a noticable difference.
With fix nve this kind of divergence will grow slower because there is no thermostat coupling. With fix npt, for example, the divergence will grow even faster, because there are more math operations and the entire system is converted from fractional to cartesian coordinates multiple time during the timestep so numbers are “getting mixed more”.