[lammps-users] mod. LAMMPS to run for > 2^31 timesteps?

_Robert_Hoy · December 4, 2010, 1:17am

Hi. I run a lot of long, small sims. of particles with “stiff” interactions that require small timesteps. I plan many runs of up to O(10^11) timesteps. The traditional approach to these within LAMMPS has been to stop every 2 billion steps, reset the timestep to zero, and restart. But for long runs, the restarting and output-file-handling required by this approach seem (to me) more work than modifying the whole code to allow runs of > 2^31 steps. I did this for a 2009 version and got it working except for a prob. with restart files, which would presumably be not too hard to fix.

Of course, that meant I was ‘stuck’ with that modified version. And have since started running newer versions of the code…

So, Steve, the question is, if I make the necessary mods. to the current version to allow running for > 2^31 steps, will you integrate these into “stock” LAMMPS and maintain them? The mods mostly involve changing variables in the code from ‘int’ to ‘long long int’ type.

The only problem with this that jumps out at me is that some systems (32 bit? or is it “worse” than that?) differ in their definitions of long long int. (of course, guess 32 bit has to use tricks to handle ints > 2^31). I guess it might also create a problem for “user” packages, but perhaps the ability to run for > 2^31 steps could BE a user package.

Thoughts?

Thanks,
Rob

akohlmey · December 4, 2010, 1:34am

hi rob,

Hi. I run a lot of long, small sims. of particles with "stiff" interactions
that require small timesteps. I plan many runs of up to O(10^11)
timesteps. The traditional approach to these within LAMMPS has been to stop
every 2 billion steps, reset the timestep to zero, and restart. But for
long runs, the restarting and output-file-handling required by this approach
seem (to me) more work than modifying the whole code to allow runs of > 2^31
steps. I did this for a 2009 version and got it working except for a prob.
with restart files, which would presumably be not too hard to fix.

Of course, that meant I was 'stuck' with that modified version. And have
since started running newer versions of the code..

depending on how deep your modifications go, this might be
fairly straightforward to maintain using git instead of the regular
patching. perhaps having this integrated into lammps-icms to
have it tested by some people outside of yourself might be
an option to demonstrate the stability of this kind of change.

i have already a selection of add-on features that were not written
or only partially written by me in the branch and there seems to be
some people using it (including people in our group and myself).

if it would be too much work to keep it up-to-date, then it would be
fairly easy to remove (much easier than in any other SCCS that i have
used in my carreer so far).

So, Steve, the question is, if I make the necessary mods. to the current
version to allow running for > 2^31 steps, will you integrate these into
"stock" LAMMPS and maintain them? The mods mostly involve changing
variables in the code from 'int' to 'long long int' type.

hmmm... long long int is not overly portable. the better way
would be to use a typedef (step_counter_t) and then make that depend
on whether a compiler/platform supports it or not or a simple compile
ime flag and perhaps use casts to double for output etc.

The only problem with this that jumps out at me is that some systems (32
bit? or is it "worse" than that?) differ in their definitions of long long
int. (of course, guess 32 bit has to use tricks to handle ints > 2^31). I
guess it might also create a problem for "user" packages, but perhaps the
ability to run for > 2^31 steps could BE a user package.

that would be messy, since you have to modify internals.

axel.

_Robert_Hoy · December 4, 2010, 3:27am

Thanks for the advice, Axel! I’d never heard of ‘git’ or lammps-icms or SCCS before - will check these out. The typedef idea also makes sense.

Best,
Rob

sjplimp · December 4, 2010, 2:35pm

This is a change I want to make at some point. I would do it
using uint64_t which is available everywhere so far as I know.

However there are lots of subtle places where timesteps are
used and stored. So I don't think it is a trivial change.

Steve

_Robert_Hoy · December 4, 2010, 6:09pm

Hi, Steve. Yup, when I tried to do it before, found I had to modify many .cpp files, so know it’s not trivial. And I only tested it with one pair style and integrator. But I guess, the question is, should I:

a) take the time to try doing this in in a maximally-portable fashion, using the helpful advice of you and Axel, test it myself, and then put it up for others to test more broadly,

or

b) just (for now) make a kludge for personal use, and wait for you to do (a)?

Obviously you could do it in a shorter time than I, but understand you may not have the time to do it now.

Thanks,
Rob

sjplimp · December 6, 2010, 3:32pm

(b)

There are enough places in the code it touches, that
I'll want to do it myself.

Thanks,
Steve

sjplimp · January 11, 2011, 12:48am

I posted a 13Jan11 patch that should enable
timestepping with 64-bit ints. So until you hit 2^63 steps,
you're good Rob. Hopefully that won't be any time soon ...

Try it out and see if it works as expected. You can use
reset_timestep to start at a step near 2^31 and see what
happens. Wouldn't be surprised if there are still a
few 32-bit storage locs or arithmetic for timesteps that
I missed.

Steve

_Robert_Hoy · January 11, 2011, 2:39am

Thanks, Steve! I’ll download this and try it out ASAP, and report back to you.

Rob

_Robert_Hoy · January 25, 2011, 7:55pm

Hi, Steve. I ran into problems trying to start a run of > 2^31 = 2147483647 steps. If I end an input script with ‘run 2100000000’, the run starts normally. If I end it with ‘run 2200000000’, LAMMPS exits with “ERROR: Invalid run command N value”.

This error was generated by lines 107-112 in run.cpp:

// set nsteps as integer, using upto value if specified

int nsteps;
if (!uptoflag) {
if (nsteps_input < 0 || nsteps_input > MAXSMALLINT)
error->all(“Invalid run command N value”);
…

This bit of code confuses me; looking at lmptype.h, I’d have thought nsteps would be a ‘bigint’ and an error message should only generated here if n_steps_input were greater than MAXBIGINT. Am I on track with this?

Thanks,
Rob

PS - The above is for the 15 Jan11 version. The 18 Jan11 tarball I downloaded today seems for some reason to lack the styh files and won’t run - it always (unsurprisingly) gives “Invalid pair style” errors. Maybe the styh files were inadvertently excluded?

sjplimp · January 26, 2011, 3:07pm

I'll take a look at the timestep issue. Re: the style files, they
are auto-generated so they shouldn't be part of the distro.
If anyone else has trouble building from the most current
tarball let me know. Sometimes other problems creep
in if something is forgotten.

Steve

_Robert_Hoy · January 28, 2011, 1:52pm

Hi. If I put two back to back commands of ‘run 2000000000’ in a script, the simulation runs fine (and restart files past step 2^31 are usable). In contrast, one command of ‘run 4000000000’ gives the “Invalid run command N value” error noted earlier.

Think I found the source of this. In update.h, nsteps is an int. Here are lines 27-31 of update.h:

bigint ntimestep; // current step (dynamics or min iterations)
int nsteps; // # of steps to run (dynamics or min iter)
int whichflag; // 0 for unset, 1 for dynamics, 2 for min
bigint firststep,laststep; // 1st & last step of this run
bigint beginstep,endstep; // 1st and last step of multiple runs
int first_update; // 0 before initial update, 1 after

So the behavior seen above makes sense - but is there a reason nsteps can’t / shouldn’t be a bigint, when ntimestep, firststep, etc. are bigints?

Thanks,
Rob

sjplimp · January 28, 2011, 2:24pm

Now I see what you're doing/asking.

You cannot do "run N" where N > 2^31. This
would take changes in several related commands,
e.g. temper, prd, tad, minimize, etc and isn't
really related to the internal timestep being
64-bit. In this syntax N is not a timestep but
a number of timesteps.

But what you can do, to accomplish the same thing
is:

run N upto (now N > 2^31 is OK, since N is now a timestep)

or

run N/10
run N/10
...

where N/10 (or whatever) is < 2^31.

I don't see this as a real limitation, since invoking 1 command
that will run for billions of timesteps seems kind of optimistic
anyway.

Steve

_Robert_Hoy · January 28, 2011, 2:40pm

Thanks, Steve. Sounds good - will probably typically want to define new computes, dumps, etc. at intermediate points in the runs, so repeated ‘run N/10’ commands will be fine.

Thanks for all the work!
Rob

_Robert_Hoy · February 10, 2011, 2:39pm

Hi, Steve. Just wanted to let you know I’ve done runs of up to 16 billion steps with no problem, and “custom” dumps started after > 2^31 steps seem to work fine.

Thanks,
Rob

sjplimp · February 10, 2011, 4:26pm

that's good news - thanks

If anyone has run a simulation with LAMMPS longer than 16B steps,
Rob will buy you an adult beverage. Otherwise I owe one to
you Rob.

Steve