simulation using ReaxFF gives nan

Dear All
I am simulating gold nanoparticle with hexanthiol using ReaxFF. Simulation is goes well in other

data.goldnp_hexan_nonbond_reax

in.goldnp_hexan_nonbond

ffield.reax

lmp_control
two cases (say LJ and EAM-morse-LJ) but for ReaxFF after few ps it gives me nan, even I reduced the time step but still i am getting nan. I saw that two thoils are moving away from the gold surface so I tried by fixing S atom of thoils now its not moving away from the surface but its gives nan after 220 ps. Could you give me some advice?
Herewith I attached the necessary files.
Thanking you in advance
Regards
Vasu


Dear All
I am simulating gold nanoparticle with hexanthiol using ReaxFF. Simulation
is goes well in other
data.goldnp_hexan_nonbond_reax<https://docs.google.com/file/d/0B7-kvWklhG-GLUZSSjk3dGtXUnc/edit?usp=drive_web>

in.goldnp_hexan_nonbond<https://docs.google.com/file/d/0B7-kvWklhG-GZnBtWi1aS3p0NkE/edit?usp=drive_web>

ffield.reax<https://docs.google.com/file/d/0B7-kvWklhG-GdlI1Nmh5cExsNGc/edit?usp=drive_web>

lmp_control<https://docs.google.com/file/d/0B7-kvWklhG-GT0hBeWZfTmZQMjQ/edit?usp=drive_web>
  two cases (say LJ and EAM-morse-LJ) but for ReaxFF after few ps it gives
me nan, even I reduced the time step but still i am getting nan. I saw
that two thoils are moving away from the gold surface so I tried by fixing
S atom of thoils now its not moving away from the surface but its gives nan
after 220 ps. Could you give me some advice?

the files are not made public.

there is no point in forcing a system to go where it doesn't "want" to go.
you have understand why that happens (bad parameters, unit settings,
initial conditions, force field)

axel.

Dear Axel
Sorry, Here is the input files.

Thanks
Vasu

data.goldnp_hexan_nonbond_reax (48.7 KB)

ffield.reax.AuSCH (10.6 KB)

in.goldnp_hexan_nonbond (2.98 KB)

lmp_control (1023 Bytes)

your input script indicates that you are using an old version of LAMMPS. please update to the latest and try again. i see only one thiol getting away, and that looks like it is due to a bad initial position.

axel.

out of curiousity, i adjusted your input deck for the current version of LAMMPS, removed the fix spring/self and let your input run on my desktop over night and it looks pretty reasonable. check it out:
http://www.youtube.com/watch?v=wPaqDW5kEUc

axel.

Dear Axel
Thank you very much for your effort. Adjusted input means decreasing the distance between S and Au atoms, Am I correct? I saw that the one which is going away from the surface has larger distance between Au and S.
Thanks and Regards
Vasu

Dear Axel
I adjusted initial position and updated with current version of LAMMPS (6 Apr 2013), removed the fix spring/self. Now thiols are not moving away from the surface but its gives -nan after 200 ps. Once again I attached my input files and log file.

Thanking you in advance
Vasu

lmp_control (1 KB)

in.goldnp_hexan_nonbond (2.31 KB)

ffield.reax.AuSCH (10.6 KB)

data.goldnp_hexan_nonbond_reax (53.9 KB)

Dear Axel
I adjusted initial position and updated with current version of LAMMPS (6
Apr 2013), removed the fix spring/self. Now thiols are not moving away from
the surface but its gives -nan after 200 ps. Once again I attached my input
files and log file.

sorry, but i don't have the time to debug everybody's inputs. you have to
do that by yourself. like with debugging a program you need to observe
closely and narrow down the conditions that cause your problem.

axel.

Dear Axel
I closely observed the behaviour of atoms and it is well behaving. As suggested in the document, I printed the thermodynamic values at every steps and it shows none of them are blowing up but I don´t know why its give nan. Please suggest to solve this issue.

Since the log file is too big, here I pasted the thermodynamic values of last few steps.
TotEng = -87396.3743 KinEng = 802.7029 Temp = 306.0116
PotEng = -88199.0772 E_bond = 0.0000 E_angle = 0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl = -87124.6401
E_coul = -1074.4371 E_long = 0.0000 Press = 21.9147
---------------- Step 212688 ----- CPU = 14369.9180 (sec) ----------------
TotEng = -87397.0878 KinEng = 802.6043 Temp = 305.9740
PotEng = -88199.6921 E_bond = 0.0000 E_angle = 0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl = -87125.2503
E_coul = -1074.4418 E_long = 0.0000 Press = 25.6867
---------------- Step 212689 ----- CPU = 14369.9839 (sec) ----------------
TotEng = -87397.8067 KinEng = 802.5751 Temp = 305.9629
PotEng = -88200.3818 E_bond = 0.0000 E_angle = 0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl = -87125.9349
E_coul = -1074.4469 E_long = 0.0000 Press = 28.8843
---------------- Step 212690 ----- CPU = 14370.0546 (sec) ----------------
TotEng = -nan KinEng = -nan Temp = -nan
PotEng = -88201.1402 E_bond = 0.0000 E_angle = 0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl = -87126.6878
E_coul = -1074.4524 E_long = 0.0000 Press = -nan
---------------- Step 212691 ----- CPU = 14370.0838 (sec) ----------------
TotEng = -nan KinEng = -nan Temp = -nan
PotEng = -2722.8745 E_bond = 0.0000 E_angle = 0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl = -542.9343
E_coul = -2179.9402 E_long = 0.0000 Press = -nan

Thanks
Vasu

Dear Axel
I closely observed the behaviour of atoms and it is well behaving. As
suggested in the document, I printed the thermodynamic values at every
steps and it shows none of them are blowing up but I don´t know why its
give nan. Please suggest to solve this issue.

i don't know a solution. something obviously *does* go wrong, but it is not
easy to tell the origin. you will have to follow commonly used debugging
principles. most importantly is to narrow down the boundary conditions to
make it easy to reproduce the problem.

how deterministic is it? i.e. does it always happen at the same point? even
if you run with different number of processors or on a different machine?
if it *is* deterministic, you could try to write out a restart right before
the nan's happen, and see if it carries over when restarting, or better
even when starting from a data file generated with restart2data.

if it is not deterministic, or machine specific, you may want to consider
upgrading/changing compilers, or compile with changed optimization or even
check if you have a hardware issue (e.g. broken memory).

axel.

Dear Axel
I will try as you suggested and let you know.

Thanks
Vasu

Vasumathi,

Using “compute reax all pair reax/c; variable eb equal c_reax[1] …”, pair_style reax/c has the option of writing itemized energy components. Please see the reax/c doc page.

I would pay special attention to the qeq energy (eqeq), since somtimes these NANs can be caused by non-converging charge equilibration calculations.

Ray

Dear Axel

It happens at different point with same number of processors and for different number of processors.

Compilers for reax/c ? because for other forcefields its works fine.

Dear Ray
Charge equilibration is converging well. Please find the log file

Thanks
Vasumathi

lammps.log (14 KB)

Dear Ray
Charge equilibration is converging well. Please find the log file

well, but it *does* show a continuous increase of the charge equilibration
energy, so you may be running into some kind of "polarization catastrophe"
somehow. i am not a reax expert, but such a continuously increasing energy
would make me a bit concerned.

axel.

Dear Axel

Dear Axel
I closely observed the behaviour of atoms and it is well behaving. As
suggested in the document, I printed the thermodynamic values at every steps
and it shows none of them are blowing up but I don´t know why its give nan.
Please suggest to solve this issue.

i don't know a solution. something obviously *does* go wrong, but it is
not easy to tell the origin. you will have to follow commonly used debugging
principles. most importantly is to narrow down the boundary conditions to
make it easy to reproduce the problem.

how deterministic is it? i.e. does it always happen at the same point?
even if you run with different number of processors or on a different
machine?

It happens at different point with same number of processors and for
different number of processors.

but *how* different. for something as finicky as reax, you cannot
expect it to be bitwise identical.

if you create a restart every 10000 step and then restart the
calculation after a crash, does the next crash happen within a few
thousand steps or does it go on for about the same time as your
original run.

if it *is* deterministic, you could try to write out a restart right
before the nan's happen, and see if it carries over when restarting, or
better even when starting from a data file generated with restart2data.

if it is not deterministic, or machine specific, you may want to consider
upgrading/changing compilers, or compile with changed optimization or even
check if you have a hardware issue (e.g. broken memory).

Compilers for reax/c ? because for other forcefields its works fine.

that doesn't mean anything. different parts of a code stress a
compiler differently. reax is a much more complex interaction with
more complex code, so it is much more likely that something breaks
with a broken compiler. mind you, practically *all* compilers are
somewhat broken, especially with high optimization levels, but most of
the time it just doesn't show. now, for a piece of code to be broken
it just needs the compiler to get it wrong somewhere, so a complex
code is more easily miscompiled than a simple code.

as an example, you can compare the code in pair_lj_cut.cpp and
pair_lj_cut_opt.cpp
it does the same thing, but the latter occasionally gets miscompiled
with some compilers under high optimization levels, the former
doesn't. on the other hand, the latter is usually running
significantly faster. so both, the compiler and optimization level as
well as the code make a difference.

axel.

I don’t think Vasumathi is even printing the energies properly in that log file. There she has

variable eqeq 	 equal c_reax[9]

but according to the manual index 9 is for the torsional energy not the qeq energy which is index 14.

Carlos

Dear Carlos
Thank you very much for pointing out my mistake.

Dear Axel and Ray
Please apologize me for my mistake and I will send the correct log file.
Once again extremely sorry for killing your valuable time.
Thanks
Vasumathi

Dear Ray and Axel
Here is the correct log file and corresponding eqeq vs time plot.

Thanks

Eqeq_goldnp.jpg

log.log (32.3 KB)

Dear Axel

Dear Axel

Dear Axel
I closely observed the behaviour of atoms and it is well behaving. As
suggested in the document, I printed the thermodynamic values at every steps
and it shows none of them are blowing up but I don´t know why its give nan.
Please suggest to solve this issue.

i don’t know a solution. something obviously does go wrong, but it is
not easy to tell the origin. you will have to follow commonly used debugging
principles. most importantly is to narrow down the boundary conditions to
make it easy to reproduce the problem.

how deterministic is it? i.e. does it always happen at the same point?
even if you run with different number of processors or on a different
machine?

It happens at different point with same number of processors and for
different number of processors.

but how different. for something as finicky as reax, you cannot
expect it to be bitwise identical.

if you create a restart every 10000 step and then restart the
calculation after a crash, does the next crash happen within a few
thousand steps or does it go on for about the same time as your
original run.

I tried as you suggested by creating a restart every 1000 step. Next crash happen beyond the original run time.
Here are the details of the consecutive runs

  1. 67722 steps (6ps)
  2. 67000-429971 steps (6-42ps)
  3. 428000-1243198 (42-124ps)
  4. 1243000-1706020 (124-170ps)

if it is deterministic, you could try to write out a restart right
before the nan’s happen, and see if it carries over when restarting, or
better even when starting from a data file generated with restart2data.

if it is not deterministic, or machine specific, you may want to consider
upgrading/changing compilers, or compile with changed optimization or even
check if you have a hardware issue (e.g. broken memory).

Compilers for reax/c ? because for other forcefields its works fine.

that doesn’t mean anything. different parts of a code stress a
compiler differently. reax is a much more complex interaction with
more complex code, so it is much more likely that something breaks
with a broken compiler. mind you, practically all compilers are
somewhat broken, especially with high optimization levels, but most of
the time it just doesn’t show. now, for a piece of code to be broken
it just needs the compiler to get it wrong somewhere, so a complex
code is more easily miscompiled than a simple code.

as an example, you can compare the code in pair_lj_cut.cpp and
pair_lj_cut_opt.cpp
it does the same thing, but the latter occasionally gets miscompiled
with some compilers under high optimization levels, the former
doesn’t. on the other hand, the latter is usually running
significantly faster. so both, the compiler and optimization level as
well as the code make a difference.

axel.

axel.

Since the log file is too big, here I pasted the thermodynamic values of
last few steps.
TotEng = -87396.3743 KinEng = 802.7029 Temp =
306.0116
PotEng = -88199.0772 E_bond = 0.0000 E_angle =
0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl =
-87124.6401
E_coul = -1074.4371 E_long = 0.0000 Press =
21.9147
---------------- Step 212688 ----- CPU = 14369.9180 (sec)

TotEng = -87397.0878 KinEng = 802.6043 Temp =
305.9740
PotEng = -88199.6921 E_bond = 0.0000 E_angle =
0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl =
-87125.2503
E_coul = -1074.4418 E_long = 0.0000 Press =
25.6867
---------------- Step 212689 ----- CPU = 14369.9839 (sec)

TotEng = -87397.8067 KinEng = 802.5751 Temp =
305.9629
PotEng = -88200.3818 E_bond = 0.0000 E_angle =
0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl =
-87125.9349
E_coul = -1074.4469 E_long = 0.0000 Press =
28.8843
---------------- Step 212690 ----- CPU = 14370.0546 (sec)

TotEng = -nan KinEng = -nan Temp =
-nan
PotEng = -88201.1402 E_bond = 0.0000 E_angle =
0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl =
-87126.6878
E_coul = -1074.4524 E_long = 0.0000 Press =
-nan
---------------- Step 212691 ----- CPU = 14370.0838 (sec)

TotEng = -nan KinEng = -nan Temp =
-nan
PotEng = -2722.8745 E_bond = 0.0000 E_angle =
0.0000
E_dihed = 0.0000 E_impro = 0.0000 E_vdwl =
-542.9343
E_coul = -2179.9402 E_long = 0.0000 Press =
-nan

Thanks
Vasu


Dr. Axel Kohlmeyer [email protected]…24… http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.


Dr. Axel Kohlmeyer [email protected]…29… http://goo.gl/1wk0
International Centre for Theoretical Physics, Trieste. Italy.

Thanks
Vasu