Metadynamics convergence / explicit water NVT and NPT

alphataubio · August 10, 2024, 6:21am

(1) what’s the abbreviation RC ?

(2) i’ve been trying to understand the concept of PMF convergence for some time. as a mathematician, im used to quantitative metrics of convergence and actual formulas for error bounds. so far all ive seen are qualitative statements about PMF convergence. if i knew what the PMF was in advance then i could compute a distance metric between the current approximation and the actual answer. if however i dont know the actual PMF a priori, then is there a numerical criteria that i can use to quantitatively measure convergence and a stopping criteria at each step ?

i have another question that’s bothering me…

(3) in the reference “Efficient Reconstruction of Complex Free Energy Landscapes by Multiple Walkers Metadynamics” in the colvar manual :

The normal dynamics of the system is assumed to be of the Langevin form

however in the lammps documentation:

Apply a Langevin thermostat as described in (Schneider) to a group of atoms which models an interaction with a background implicit solvent

i have plenty of quota on the clusters im working with, so i dont want to “cut corners” by going with implicit water. i much prefer to use explicit TIP3P water with CHARMM force field (pair lj/charmmfsw/coul/long). does that mean i cant use NVT or NPT for thermostatting, barostating, and time integration ?

srtee · August 10, 2024, 7:09am

Hi there,

(1) I meant “RC” to stand for “reaction coordinate”, or the collective variable more generally. (It was not good form for me to use the initialism without defining it!)

(2) I’m afraid molecular dynamics does much for the practitioner’s convenience and very little for the theorist’s conviction. A good starting source is “Quantifying uncertainty and sampling quality in biomolecular simulations” (Grossfield and Zuckerman 2009), and as they say:

No method known to the authors can report on a simulation’s failure to visit an important region of configuration space unless these regions are already known in advance.

Thus, we instead focus on assessing sampling quality in the regions of space that have been visited.

Nightmarish examples are readily conjured. For example, a simulation of water at the triple point (of its force field model) would not have “converged” until several transitions have been observed between solid, liquid, and gas state, and a “trajectory average” of any quantity would depend entirely on the relative durations of all states while being completely uninformative about any one state.

As another example, if you start an unbiased simulation of completely unfolded polypeptide in salty water, for anything longer than a dozen or two residues, you are exceedingly unlikely to observe the folded state in any reasonable time despite you (and even the force field!!) knowing it is the “ground” state.

So take Grossfield’s advice: worry less about the (configurational) space you haven’t visited and focus on the space you have. Even an estimate of autocorrelation time, and thus the effective number of independent samples, will (unfortunately) put you in the top decile of thoroughness among all published molecular dynamics results.

(3) The Langevin dynamics model of the collective variables is a general statistical mechanical result. Briefly, you separate your “slow” dynamics of interest from the “fast” dynamics that ~~annoy your supervisor~~ make your integration timestep too short, and then you ~~handwave vigorously~~ use the Mori-Zwanzig formalism to project out the Langevin dynamics needed for your slow dynamics.

I’m mostly kidding about the handwaving – Langevin dynamics (especially the “generalised” sort) are very useful, helpful and robust. More importantly, one can expect the suitable collective variables of a system to follow Langevin dynamics, regardless of the exact “micro” thermostat used at particle level. (More precisely, look into the Mori-Zwanzig formalism and check if any assumptions are broken.)