Fix print timestep variable returned a bad timestep

Anshul_DSP · August 11, 2021, 1:26pm

Hi

I am planning to run a longer simulation (> 2**31) for a very tiny 2D system with lj interactions.
Here is the snippet from the input file:

<BASH>
MDsteps=900000000
dt=0.005
<PATH> -in in.KALJ_nvt_Nose    -var fname $filename -var tempval $temp -var runleng $MDsteps -var tstep $dt 

<in.KALJ_nvt_Nose>
log data_${fname}
#log none 
units		lj
dimension	2
atom_style	atomic

pair_style      lj/cut   2.50    # Define interaction potential.
read_data       ${fname}            # initconf_T05eq.data # read data file (incl.mass info)

*pair_coeff  and mass details*    
  
fix     1   all enforce2d
change_box  all triclinic

neighbor        0.3 bin
neigh_modify    every 1 delay 0 check yes # Update neighbor

timestep    ${tstep} #0.005   

#prints only if also some MD step run command follows
compute     msd all msd com yes
variable	vstep	equal	step*dt
variable	vpe		equal	pe
variable	vpress	equal	press 
variable	vtemp	equal	temp
variable    vke		equal	ke
variable    vmsd	equal	c_msd[4]
thermo_style    custom step pe press temp ke c_msd[4]
#fix 1 all nvt temp 10.00 10.00 $(100.0*dt)
variable        s equal logfreq(1,1,2)
restart         v_s logscale-${fname}-*.restart
restart          500000    linearscale0-${fname}.restart linearscale1-${fname}.restart
fix 		    data  all print v_s " ${vstep}  ${vpe}  ${vpress}  ${vtemp}  ${vke}   ${vmsd}"     append thermo_data-logscale-${fname}.txt
fix 		    data1 all print 500000 " ${vstep}  ${vpe}  ${vpress}  ${vtemp}  ${vke}   ${vmsd}"  append thermo_data-linscale-${fname}.txt 
#set numerical integrator
fix nose all nvt temp ${tempval} ${tempval} $(100.0*dt)
run ${runleng} 
run ${runleng} 
run ${runleng} 
run ${runleng} 
run ${runleng} 
....

During my run, the simulation stops after the MD_step : 536870912
And the thermo data is written till time (=MD_step*dt): 5367500. Can the run crashed in between 5367500 - 5368000.

The error if the following:
ERROR: Fix print timestep variable returned a bad timestep (…/fix_print.cpp:177)
Last command: run ${runleng}

I don’t understand the reason, I am running the simulation with length < 2**31 with multiple runs too. ANd feedback would be greatly helpful.

Regards,
Anshul

akohlmey · August 11, 2021, 2:02pm

Which version of LAMMPS is this with?

Anshul_DSP · August 11, 2021, 2:25pm

lammps-3Mar20

akohlmey · August 11, 2021, 4:58pm

Thanks. As far as fix print is concerned that version is new enough to not have an issue with 64-bit time step numbers. But there could be some issue elsewhere. I will have to make some test myself and confirm whether this may already be corrected in the latest patch version. We’re currently hosting a LAMMPS conference, so my time is a bit limited. Will respond here when I have more news.

As a workaround, I would suggest to run the simulation in chunks so that the timestep number remains below 2 billion and then close all outputs and timestep related functionality (dump/fixes etc.) and then use reset_timestep 0 and then enable the dumps/fixes again and do another chunk.

Anshul_DSP · August 11, 2021, 5:17pm

Thanks!

I am just using a system of 100 particles therefore I hit this error in 2 hrs of run using single core.
The MS step corresponds to 2^29 still less than 2^31 and possible should be able to dump the time step for 2^30 at least.

I use many small “run” for MDsteps/ length 900000000 \approx 2^29.745.

The resetting time stamp does make sense as a workaround however the simulation setup won’t be clean :(. But I thought to seek your attention if I am really making some silly mistake. But we can surely follow up more once you have time to check it.

Thanks for your consideration.
Regards,
Anshul

akohlmey · August 11, 2021, 7:57pm

Just made a simple test with an input doing nothing and the current patch version of LAMMPS (30 July 2021) works fine. After some careful checking, this looks like a bug that has been fixed after your version was released. The code for the logfreq() and similar functions in the variable command was not made 64-bit safe until rather recently. With 32-bit integer math it will overflow and return an invalid negative value.

Anshul_DSP · August 11, 2021, 8:10pm

Ohh, fantastic.

I will check that too. I have edited and added new potential file so I will need to edit that too but fist I will check my script for this point.

Will get get back to you.

Regards,
Anshul

akohlmey · August 11, 2021, 8:49pm

Here is my little test script:

region box block 0 1 0 1 0 1
create_box 1 box
create_atoms 1 single 0.0 0.0 0.0

pair_style zero 0.001
pair_coeff * *
mass * 1.0
neigh_modify once yes

variable s equal logfreq(1,1,2)
fix 1 all print v_s "$(step)"

run 2000000000

Anshul_DSP · August 22, 2021, 2:36pm

Yes, it is good now.

Thanks!