Segmentation fault in colvarbias_meta::calc_energy() with useGrids off

alphataubio · August 10, 2024, 12:29am

@giacomo.fiorin i was trying to evaluate the following statement from the colvars lammps documentation (section 6.5.1):

In typical scenarios the Gaussian hills of a metadynamics potential are interpolated and summed together onto a grid, which is much more efficient than computing each hill independently at every step (the keyword useGrids is on by default). This numerical approximation typically yields negligible errors in the resulting PMF [1].

My strategy was to run a minimal example colvars-lj with useGrids on vs useGrids off to measure the numerical approximation errors from useGrids.

colvars-lj.in (607 Bytes)
colvars-lj.colvars (434 Bytes)

However, when running with useGrids off, i get a segmentation fault in colvarbias_meta.cpp:675:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x140)
  * frame #0: 0x0000000107360910 liblammps_kk.0.dylib`colvar_grid<double>::new_index(this=0x0000000000000000) const at colvargrid.h:813:30
    frame #1: 0x0000000107365ca8 liblammps_kk.0.dylib`colvar_grid<double>::get_colvars_index(this=0x0000000000000000) const at colvargrid.h:665:22
    frame #2: 0x000000010737c9cc liblammps_kk.0.dylib`colvarbias_meta::calc_energy(this=0x000000013484da00, values=size=0) at colvarbias_meta.cpp:675:37
    frame #3: 0x0000000107349450 liblammps_kk.0.dylib`colvarbias::update(this=0x000000013484da00) at colvarbias.cpp:347:28
    frame #4: 0x000000010737aa8c liblammps_kk.0.dylib`colvarbias_meta::update(this=0x000000015a81d800) at colvarbias_meta.cpp:448:35

frame #2: 0x000000010737d30c liblammps_kk.0.dylib`colvarbias_meta::calc_energy(this=0x000000015105d400, values=size=0) at colvarbias_meta.cpp:675:37
   672 	
   673 	  std::vector<int> const curr_bin = values ?
   674 	    hills_energy->get_colvars_index(*values) :
-> 675 	    hills_energy->get_colvars_index();
   676 	
   677 	  if (hills_energy->index_ok(curr_bin)) {
   678 	    // index is within the grid: get the energy from there

(lldb) p hills_energy
(colvar_grid_scalar *) nullptr

i filed a bug report at the colvars github repo.

does that mean the colvars library assumes that users will only run with useGrids on which is the default, because nobody would want to run with useGrids off for performance reasons ?

i have a different question about the validity of PMFs from metadynamics with explicit water that i’ll post in LAMMPS Beginners shortly.

grazie !

alphataubio · August 10, 2024, 1:55am

i submitted a bugfix PR in colvars/colvars repo to enable useGrids off. might be too late to include in lammps stable release for this year.

Also, running my colvars-lj minimal example for only 100 steps:

useGrids on
*** x[0] 0.270824 0.0710592 0.108942 f[0] -19.6194 12.2672 -6.04801
*** x[1] 0.913036 0.854345 0.0967073 f[1] -1.0066 17.4302 8.17008
*** x[2] 0.970611 0.0752138 0.937068 f[2] -4.61275 10.1023 -9.53507
Total wall time: 0:00:08

useGrids off
*** x[0] 0.171592 0.0759112 0.0684578 f[0] -14.3263 7.88552 -8.00814
*** x[1] 0.941056 0.851519 0.0747519 f[1] -4.23791 10.3506 4.15868
*** x[2] 1.02938 0.0606518 0.982726 f[2] -0.318681 8.97035 -8.94888
Total wall time: 0:00:08

absolute error (x[0][0]) = 0.270824 - 0.171592 = 0.099232
relative error (x[0][0]) = 0.270824/0.171592-1 = 0.578302 (!!!)

for a non-measurable difference in running time, my suggestion would be to change the default to useGrids off and update colvars documentation ?

unless there’s something invalid in how i defined my minimal example metadynamics rmsd colvar ??

srtee · August 10, 2024, 4:04am

From what I understand you have got a short run where the metadynamics potential changes every step. This is a very bad way to compare the validity and efficiency of gridded vs non-gridded metadynamics.

The comparison won’t be valid because, if the metadynamics bias is changing every step, the system never reaches anything like a steady state with respect to the metadynamics potential. A metadynamics system, ideally, retains its original behaviour right at the start of the run and approaches free diffusion in the RC (edit – RC = “reaction co-ordinate”, or collective variable more generally) when the metadynamics potential converges. Your validation run doesn’t fully sample either of these regimes, so you have insufficient information to conclude that your two results are statistically different (sure, they’re 58% relatively different – but what standard deviation do you expect from a run of 100 steps? Do you expect that number to be substantially smaller for some reason?)

Also, a short run does not adequately capture the compute efficiency due to gridded hills. A gridded run requires NG Gaussian evaluations per step if there are NG grid points. An ungridded run requires NH evaluations where NH is the number of hills deposited up to that point. Over a short run where NH never approaches NG you should not expect the gridded implementation to show any computational cost reduction (nor, for that matter, would you expect the metadynamics potential to converge).

alphataubio · August 10, 2024, 6:24am

thanks @srtee.

to be continued in Metadynamics convergence / explicit water NVT and NPT, since this is no longer about lammps development but more a beginner’s questions.