lammps crash with AIREBO CH potential

I started using AIREBO and ran into the same problems as previous users have had with it crashing.

I’ve been able to trace the bug in pair_airebo.cpp to the Lennard-Jones calculation of forces that occurs in the bondorderLJ() function, where just before the kinetic energy first spikes, I see the “f2” and “f3” variables going very high. This occurs because “sin321” is calculated to be 0.0007, which is large enough to get through the “if” statement in line 2682 (since it’s greater than TOL=1.0E-9) of the November 17 2016 version of the source code (attached), but small enough when squared to mess up the forces. Richard and Axel also reported this issue on Github, but I guess their fix of putting in that tolerance just wasn’t strict enough. When I change TOL to 1.0E-3, the program runs without crashing.

As further evidence that it’s the Lennard-Jones calculation that contains the bug, when I set the LJ_flag of AIREBO to 0, I also found that the program runs without crashing. Further, the AIREBO/Morse style also runs without crashing.

I’m still concerned about 2 things.

1: Other bugs might be present in the code since Steve Stuart mentioned that there are “several bugs in the force calculation.” If this is just one of several bugs, and catching this one allows the program to run, but it runs incorrectly, it could make matters worse since it leads to false confidence in the code. Steve, do you happen to know in which parts of the code the known errors you mentioned are supposed to be lurking?

2: When TOL is set so low, perhaps calculations are skipped more regularly than they should be. Perhaps a separate tolerance should be set for line 2682 than the other spots? Is it okay for the “if” loop that line 2682 leads into to be skipped every once in a while?

Any thoughts? If other AIREBO users that were getting this error can check whether this fixes their simulations too, that could be helpful.

Efrem Braun

pair_airebo.cpp (137 KB)

Efrem,

For the dihedral angles for near-colinear bonds, an abrupt cutoff will always cause some problems (whether at TOL=1e-9 or 1e-3). If the cutoff is too small, it will miss some cases like yours. If the cutoff is large, then a greater number of systems will routinely encounter energy conservation problems as the dihedral angle potential is switched off abruptly.

What we do in our standalone code is turn the torsional interactions off smoothly using a cubic spline in cos(θ), where θ is either of the bond angles involved in the dihedral, over the range -0.995 ≤ cos(θ) ≤−1. (I.e. turning on at θ ≈ 174.3 degrees). This cubic spline isn’t part of the original AIREBO potential, so this is an “off-label” fix.

But more broadly: yes, there are almost certainly other bugs lurking in the LAMMPS AIREBO code. If I knew exactly where, we’d have fixed them, of course. I’d love to be able to find the time to do a thorough comparison of the LAMMPS and our in-house implementation of AIREBO to track these down, but haven’t managed to do so.

Steve, thanks for that very detailed email. If your standalone code isn’t too different from the LAMMPS implementation, I might be able to do a comparison of the two to see if I can spot the bugs (if you’re willing to share portions of your standalone code, of course).

I’m going to suggest to the LAMMPS developers that a warning be put in the pair_airebo documentation stating that there are known bugs in the implementation. I just put in a pull request to the Github site along these lines.

Also, it looks like I jumped the gun a little on some of my crash/no crash reports. AIREBO/Morse still ended up crashing, though later than AIREBO with LJ forces did. So it appears that’s there a bug in the Morse calculation too. Still no crashes with AIREBO with LJ_flag set to 0, and still no crashes with AIREBO with LJ_flag set to 1 when TOL is lowered to 1.0E-3, though it’s possible that a crash will still come. For my reference system, normal AIREBO crashed at 779 ps, AIREBO/Morse crashed at 9,539 ps, AIREBO with LJ_flag set to 0 is still going strong up to 16,000 ps, and AIREBO with LJ_flag set to 1 with TOL set to 1.0E-3 is still going strong up to 5,000 ps.

HI - thanks for the AIREBO details from Efram and Steve S. It sounds like we should implement
the splining idea to avoid the singularity for 1/sin() when 3 atoms are colinear (sin() -> 0),
rather than the simple tolerance check used now.

Physically, when are 3 atoms (nearly) colinear? Is this only an issue
with high-temp conformations?

Efram: I think this issue is independent of LJ vs Morse. I.e. the dihedral
calculation with the 2 sin() values is common to both.

Questions for Steve: when you say this:

This cubic spline isn’t part of the original AIREBO potential, so this is an “off-label” fix.

Does that mean the LAMMPS implementation is faithful to the original 2000
paper, i.e. the singularity in the potential existed in the eqs in that paper,
and you made the spline addition later in your code?

Or Is it only a numerical issue associated with how you define the dihedral in the limiting case
of colinear atoms? If so, the common CHARMM, AMBER, etc dihedral
potentials (also in LAMMPS) don’t have this issue (and also don’t do splining).
So I’m fuzzy on what issue the spline is addressing, or if there is a simpler fix.

Also for Steve: are there other “off-label” changes you recall, that we should know about
and add to the LAMMPS implementation? I see some other uses of a tolerance
parameter in the code, which now I wonder about.

Question for the Curtin/MIT collaborators: in previous emails you said about your AIREBO code,
which didn’t crash when LAMMPS sometimes did:

I got these routines a long time ago when collaborating with a CNRS colleague and never touched the energy and force part. I >used it extensively (essentially for graphitic systems) and it has always given solid results. Looking at it in the past days I found >some comments by “sjs” in the energy and force routines so I suspect it might come originaly from Steve Stuart’s group.

Can you check if your code also does the splining on the dihedral calculation?

I’ll send another email off-line about code versions and comparisons …

Thanks,
Steve

hi everybody,

following some details posted on github a few days ago about out-of-bounds array accesses in AIREBO, which can result in incorrect spline coefficients https://github.com/lammps/lammps/issues/59 , i have refactored the various high level spline routines to avoid OOB scenarios cleanly and also simplified it a bit. the refactored version is available here: https://github.com/lammps/lammps/pull/483

with some of the problem causing inputs provided, this change results in small deviations, while previously non-problematic inputs are still producing identical numbers. together with the changes we integrated in february, we should be a significant step further along the path to have a bug-free airebo pair style in LAMMPS.

if you have time to test, please give it a try and provide feedback by adding comments to either the issue or the pull request listed above.

thanks,
axel.

Hi all,

I want to report that the latest version of AIREBO included in the August 11 2017 release is crashing for me again, though now only for Monte Carlo simulations. The issue appears with the REBO part of the potential; when I set LJ_flag to 0 and TORSION_flag to 0, crashes still occur. It could be that the issue is with the REBO potential itself rather than the LAMMPS implementation, if it wasn’t meant to handle carbon atoms being in very strange positions relative to each other (which wouldn’t occur with MD but does occur when testing out certain configurations with GCMC).

A (not-so-minimal) working example is attached. The seg fault occurs for me on one computer timestep 3600, which takes around 4 hours on a single processor. Unfortunately I find that the seg fault occurs at different timesteps on different computers even when keeping the random number seed constant (not sure why, would need to look into the GCMC source code more), and it doesn’t always occur, so this will be a tricky one to reproduce and debug. I’ve tried doing some gdb debugging on my own, but the backtrace takes me to a line in dump.cpp, so I’d need to do some more work on this to pinpoint it better. I may look into this further, but the bug is not so important if it doesn’t indicate anything wrong with the AIREBO potential itself.

If the error is inherent to AIREBO and has to do with inserted carbon atoms nearly overlapping or something like that, a potential workaround is making the “region” for the GCMC command exclude the space around existing carbon atoms where insertion moves wouldn’t get accepted anyway.

Huge thanks to everyone for getting the AIREBO part fixed up! It’s letting me do what I need to do! No immediate need to respond to this one, but I thought it might be good for developers to be aware.

system.in.old (2.9 KB)

system.blockpockets (40.9 KB)

system.data (20.2 KB)

So in your MC move/insertion, how close are you putting 2 C atoms to each

other and asking for the energy?

It seems like you should be able to isolate the precise MC snapshot that leads

to a crash and create a data file that LAMMPS REBO cannot

compute the energy for, e.g. gives a NaN or the like. That might

well crash LAMMPS if you try to dump the values or do some

other computation with the energy value.

Steve

Good idea of course. I’ve modified fix_gcmc.cpp to print out the attempted insertion coordinates. Let’s see if this works. I’ll get back to you tomorrow if it does, or in a few days if I need to debug my debugging.

Okay, I’ve done the debugging as mentioned. Oddly, it looks like the issue actually has to do with dumping the trajectory rather than the forcefield.

  • The error gets thrown AFTER the last move that takes place at timestep=6799 is rejected. So the error is thrown while writing the dump file, and has nothing to do with a change in the atomic positions messing up the energy calculation.
  • Changing the dump frequency from 100 timesteps to 1 timestep gets rid of the segfault (this simulation is still going strong at timestep 12200, even though the logfile shows that the exact same trajectory was followed up to the time that the other simulation failed).
  • I used gdb to backtrace the segfault, and the line that gives the error is actually in dump.cpp. The output of the backtrace is below.

Backtrace:

Program received signal SIGSEGV, Segmentation fault.

0x00000000005047a2 in LAMMPS_NS::Dump::idcompare (i=16, j=-1309081512, ptr=) at …/dump.cpp:803
803 if (idsort[i] < idsort[j]) return -1;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.132.el6.x86_64 libgcc-4.4.7-4.el6.x86_64 libstdc+±4.4.7-4.el6.x86_64
(gdb) backtrace
#0 0x00000000005047a2 in LAMMPS_NS::Dump::idcompare (i=16, j=-1309081512, ptr=) at …/dump.cpp:803
#1 0x000000000050594d in do_merge (index=0xd01980, num=129, ptr=0xd03da0, comp=0x504790 <LAMMPS_NS::Dump::idcompare(int, int, void*)>) at …/mergesort.h:54
#2 merge_sort (index=0xd01980, num=129, ptr=0xd03da0, comp=0x504790 <LAMMPS_NS::Dump::idcompare(int, int, void*)>) at …/mergesort.h:108
#3 0x0000000000506ad5 in LAMMPS_NS::Dump::sort (this=0xd03da0) at …/dump.cpp:706
#4 0x00000000005072e8 in LAMMPS_NS::Dump::write (this=0xd03da0) at …/dump.cpp:401
#5 0x0000000000711325 in LAMMPS_NS::Output::write (this=0xc52f90, ntimestep=6800) at …/output.cpp:310
#6 0x0000000000942d04 in LAMMPS_NS::Verlet::run (this=0xc550a0, n=10000000) at …/verlet.cpp:344
#7 0x0000000000912820 in LAMMPS_NS::Run::command (this=0x7fffffffdc00, narg=, arg=0xce9810) at …/run.cpp:183
#8 0x000000000068b153 in LAMMPS_NS::Input::command_creator<LAMMPS_NS::Run> (lmp=, narg=1, arg=0xce9810) at …/input.cpp:861
#9 0x000000000068986b in LAMMPS_NS::Input::execute_command (this=0xc39880) at …/input.cpp:844
#10 0x000000000068ab5c in LAMMPS_NS::Input::file (this=0xc39880) at …/input.cpp:244
#11 0x000000000069a926 in main (argc=1, argv=0x7fffffffdeb8) at …/main.cpp:60

(gdb)

Hi mailing list responders,

Thanks again for having been a huge help with getting AIREBO up and running. Your efforts allowed me to complete a study which has just gotten published. I wanted to point out that if you take a look at the acknowledgements section of http://doi.org/10.1073/pnas.1805062115, you’ll see my thanks in print.