I am using LAMMPS 23Jun2022 and COLVARS from 2022-05-09.
I have a simple colvars file used in my simulation. It does ABF biasing of a distance between CoM of 2 groups of atoms. I don’t think its details are important but I can post it if you think its contents might somehow be important.
The lammps input file is:
...
fix abf all colvars abf_BDT.colvars tstat fix_langevin seed 226 output abf
dump dumpProd all xyz 2000000 prod.xyz
thermo 2000000
thermo_modify flush yes
restart 1000000 prod_1.restart prod_2.restart
run 2000000000
run 2000000000
Then the prod.log
looks like:
...
colvars: Saving collective variables state to "prod_1.restart.colvars.state".
2144000000 1.0696377 1.6489294 0.7622016
colvars: Synchronizing (emptying the buffer of) trajectory file "abf.colvars.traj".
colvars: Saving collective variables state to "prod_1.restart.colvars.state".
colvars: Synchronizing (emptying the buffer of) trajectory file "abf.colvars.traj".
colvars: Saving collective variables state to "prod_1.restart.colvars.state".
2146000000 1.0241619 1.6144459 1.8996601
colvars: Synchronizing (emptying the buffer of) trajectory file "abf.colvars.traj".
colvars: Saving collective variables state to "prod_1.restart.colvars.state".
2148000000 0.94262713 1.2080264 0.7647839
2150000000 1.0375809 1.3780924 0.82843873
2152000000 0.96214974 1.6101656 1.4263244
2154000000 1.0146399 1.3333802 1.1378838
2156000000 1.0618743 1.6131358 0.9176836
2158000000 1.0499825 1.3201005 0.86687496
2160000000 1.0026682 1.573913 1.094465
...
and
$ tail abf.colvars.traj
2147474000 4.05303602826298e+00
2147475000 5.64231016584322e+00
2147476000 1.18513042521774e+01
2147477000 1.68327539999665e+01
2147478000 1.69308585151818e+01
2147479000 1.50852408863132e+01
2147480000 1.59986091171063e+01
2147481000 1.42899089811339e+01
2147482000 6.74848594216875e+00
2147483000 2.68667889444293e+00
so looks like colvars just “died” quietly. But I don’t know how lammps works internally and it’s hard for me to imagine a part of a running program would die unless it’s a separate thread.
My colvars dumpstep is 1000 and log2(2147483000) < 31
but log2(2147484000) > 31
, so it (most likely) happened exactly after 2^31.
After finding this, I originally thought colvars just used a regular int
for for the step, which would be unusual, but ok fine, fixable. However, they have a special type declared colvarmodule.h:99: typedef long long step_number;
and they seem to use it in all the right places. So I am not sure what the issue might be. Perhaps they have something non-obvious as a simple int
and it gets assigned the timestep, which then breaks the whole module. But I am not sure how I would look for it without compiling everything in debug mode, running for 2bln steps (or setting the timestep to something like 2^31 - 10 might be enough), and looking at what happens after it crosses 2^31.
So I am now looking for a way to run with colvars for more than 2^31 steps. I could try to put the timestep back to 0 after run 2000000000
, but this would make log-files harder to parse so I’d prefer to avoid it if possible.