[lammps-users] bug in restart file

Lukas_Wittwer · October 6, 2009, 6:55am

Hi,

I'm running simulations which consist of the following steps:

1) heating my system using the temp/berendsen fix and NVE fix
2) the equilibrate it using the NVT fix
3) and subsequently equilibrate the system using the NVE fix.

When I perform the steps above in one single run everything works out fine, but when I restart the simulation from a restart file written after step 1 the result is totally different.
What can I do to prevent that behavior ?
Thanks in advance for any help.

Greets Lukas

akohlmey · October 6, 2009, 7:28am

Hi,

lukas,

I'm running simulations which consist of the following steps:

1) heating my system using the temp/berendsen fix and NVE fix
2) the equilibrate it using the NVT fix
3) and subsequently equilibrate the system using the NVE fix.

When I perform the steps above in one single run everything works out
fine, but when I restart the simulation from a restart file written
after step 1 the result is totally different.

please explain what you mean with a "totally different" result.
what is different and how? ... also if you think there is a
bug, please provide a way to reproduce it. your description
is far too general. i don't see any problems with restarting
in my simulations, for example, and i am very certain that
others would have noticed, too, if something as fundamental
would be broken.

What can I do to prevent that behavior ?

you first have to convince us that there is something wrong.

cheers,
axel.

Lukas_Wittwer · October 6, 2009, 8:10am

Axel Kohlmeyer schrieb:

Hi,

lukas,

I'm running simulations which consist of the following steps:

1) heating my system using the temp/berendsen fix and NVE fix
2) the equilibrate it using the NVT fix
3) and subsequently equilibrate the system using the NVE fix.

When I perform the steps above in one single run everything works out fine, but when I restart the simulation from a restart file written after step 1 the result is totally different.

please explain what you mean with a "totally different" result.
what is different and how? ... also if you think there is a
bug, please provide a way to reproduce it. your description
is far too general. i don't see any problems with restarting
in my simulations, for example, and i am very certain that others would have noticed, too, if something as fundamental
would be broken.

I'm simulating thiols on a gold surface. When running it in a single run my system is well ordered (herringbone) but when restarting it from a restart file the system happens to be disordered.

akohlmey · October 6, 2009, 10:02am

Axel Kohlmeyer schrieb:

[...]

> please explain what you mean with a "totally different" result.
> what is different and how? ... also if you think there is a
> bug, please provide a way to reproduce it. your description
> is far too general. i don't see any problems with restarting
> in my simulations, for example, and i am very certain that
> others would have noticed, too, if something as fundamental
> would be broken.
>
>
I'm simulating thiols on a gold surface. When running it in a single run
my system is well ordered (herringbone) but when restarting it from a
restart file the system happens to be disordered.

so please provide a set of input files that reproduce this.

if you don't make it easy to track down your problem, nobody
will do it for you.

the fact that you see a difference is no ultimate proof that
there is a bug. it could just as well mean, that you are
making a mistake in your lammps scripts. please also keep in
mind that the writing to restart files is scattered throughout
many of the classes of lammps, so there can be a problem
everywhere, but also, if your restart contains data that
the read_restart is not expecting, it can corrupt your restart.

cheers,
axel.

Lukas_Wittwer · October 6, 2009, 11:03am

Axel Kohlmeyer schrieb:

data.layer_bpdt_bcc_neutr_15x18 (1.02 MB)

in.au_bpdt_bcc_large_run14 (7 KB)

in.restart_run14_heat_100 (2.45 KB)

sjplimp · October 6, 2009, 11:52am

Check the thermo output before and after restarts
and compare it to continuous runs that didn't restart.
If there are tiny differences that diverge slowly
over time, then that is normal and should lead to
nothing worse that 2 simulations that are the
same in a statistical sense. If they are rapidly
radically different then there may be something
wrong with how you are restarting.

Steve

akohlmey · October 10, 2009, 9:15pm

lukas,

it looks as if there is either some non-initialized memory
or memory corruption going on. i have not been able to track
this down, but i get very different initial energies depending
on whether i run in serial or parallel and on what platform
i am running on. this needs some systematic testing...

cheers,
axel.

Lukas_Wittwer · October 13, 2009, 12:46pm

hi,

I ran my simulations, both the restarted run and the single run, on the same number of CPUs (4) and on the same machine

greets,

Lukas

sjplimp · October 14, 2009, 2:11pm

I looked at your input files today. The 2 input scripts
are 350 and 100 lines long and run for 100s of 1000s
of timesteps. If you want me to debug something you'll
need to simplify it considerably. I need the smallest
script and smallest data file that will reproduce the
problem.

The current version of LAMMPS won't even read your
data file, so I suggest you upgrade to that version first.

Also, I notice that your 2nd script which reads the
restart file, also resets some velocities to 0 before doing a run.
The first input script does not do this after writing
the restart file. So why would you expect the restarted
run to give the same dynamics? Are those velocities
guaranteed to already be 0 ?

Steve

Lukas_Wittwer · October 15, 2009, 7:55am

Here are the input files for a small test run. I can run it on the latest version (7Jul09) of LAMMPS.
Hope you can help me - thanks in advance

Lukas

data.layer_bpdt_bcc_neutr4x4 (52.5 KB)

in.restart_test11 (1.9 KB)

in.test11 (2.25 KB)

_Li_Weina · October 15, 2009, 8:21am

Hello, Lukas.
Sorry I can't help here, but i have a little doubt about your input.
You read initial atom coordinates, is there any need to use lattice
command again? I also read_data, but I didn't lattice it, hope my
results aren't wrong.
Hope you can solve your problem.
Best regards,
weina

sjplimp · October 16, 2009, 1:35pm

ok - I'll take another look, but it will probably be next week

Steve

sjplimp · October 30, 2009, 10:48pm

Your question was why you are getting different thermo output when
you do a restart vs continue a run.

In your restart file you have these lines:

pair_coeff 1 * lj/charmm/coul/long 0.25 3.5635948725613571
pair_coeff 2 * lj/charmm/coul/long 0.086000000128358844 3.3996695079448309
pair_coeff 3 * lj/charmm/coul/long 0.015000000064220668 2.5996424587350853
pair_coeff 4 * lj/charmm/coul/long 0.015700000004219245 1.0690784617205229
pair_coeff 5 * lj/charmm/coul/long 0.25 3.5635948725613571

If you replace them with these lines, you will get the same thermo output
as in your original run:

pair_coeff 1 1 lj/charmm/coul/long 0.25 3.5635948725613571
pair_coeff 2 2 lj/charmm/coul/long 0.086000000128358844 3.3996695079448309
pair_coeff 3 3 lj/charmm/coul/long 0.015000000064220668 2.5996424587350853
pair_coeff 4 4 lj/charmm/coul/long 0.015700000004219245 1.0690784617205229
pair_coeff 5 5 lj/charmm/coul/long 0.25 3.5635948725613571

That is what you effectively had in the original run (pair coeffs run from
the data file). In that case, LAMMPS will perform mixing to generate
I,J coeffs.
In the 1 * case, you are setting the 1,2 value (and 1,3 and 1,4 etc) to be the
same as 1,1. Which is not the same as mixing.

Hence your vdwl thermo output was totally different.

I also found a bug with hybrid neighbor lists for the (somewhat odd) case
your are modeling with a hybrid pair style
where 1,1 and 2,2 interactions are with one potential,
but 1,2 is with another. I'll post a patch for that. But it wasn't the
reason for your problem.

Steve

Lukas_Wittwer · November 2, 2009, 9:21am

Hi,

thank you very much for your help. It seems to work now :).

greets

Lukas

Steve Plimpton wrote: