Unexpected behaviour during hybrid Monte-Carlo simulations (ReaxFF)

Dear LAMMPS users,

I am trying to run Monte Carlo simulations using reaxff to study the oxygen coverage on a Pt nanoparticle. I defined my oxygen reservoir using the chemical potential definition for a perfect gas. Initially, the system is very small (350 atoms) and placed in a 40x40x40 box.

I want to explore the phase space of the Pt nanoparticle at a given temperature so I also use the “fix nvt” command.

At first, everything seems to go well: the temperature of the system is well equilibrated. But after a while, the dump file (.xyz) becomes unreadable because of bad printing: lines with strange characters, ghost atoms (duplicates sometimes), I even saw the addition of a platinum atom. It also seems that the average temperature shifts slightly down (3~5 K).

Sometimes the simulation will crash with “segmentation fault” or just go on.

I am not an expert in MD simulations nor is my group or my supervisor, I did my best to run something coherent, but knowing the complexity of reaxff forcefield the problem surely comes from me.

Any comment will be appreciated.

input: (I run this using 4 MPI processes)

boundary        f f f  # Periodicity doesn't really matter

units           real

atom_style      charge
read_data       mc.lmp_data

pair_style      reaxff NULL
pair_coeff      * * ./ffield.reax.pt_o_h Pt O

fix             q_equi all qeq/reax 1 0.0 10.0 1e-6 reax/c #Charge equilibration for reaxff

thermo_style    custom step temp press atoms density etotal pe ke
thermo          5

timestep        0.25

neighbor        2.0 bin #Default for units real
neigh_modify    every 5 delay 0 check yes #We build every time MC is called

minimize        1.0e-8 1.0e-9 100000 1000000 #Relax the nanoparticle

region          mc_in sphere 20 20 20 16 side in 
region          mc_out sphere 20 20 20 6 side out

region          mc intersect 2 mc_in mc_out #We limit the MC volume to the nanoparticle surface

restart         25000 gcmc.*.restart

group           gas type 2 #Oxygen
group           nano type 1 #Pt

velocity        nano create 350.0 335679138 dist gaussian #The pt nanoparticle will be sampled at 350K

compute_modify  thermo_temp dynamic yes #Necessary if the number of atom changes

compute         nano_temp nano temp #set NVT and uVT to the same temp
compute_modify  nano_temp dynamic/dof yes

fix             nanofix nano nvt temp 350.0 350.0 $(100.0*dt) # NVT for the nanoparticle
fix_modify      nanofix temp nano_temp

dump            gcmc_equi all xyz 50 lmp_gcmc.xyz 
dump_modify     gcmc_equi element Pt O

run             40000 #Small NVT equilibration for the nanoparticle

fix             gcmc_equi gas gcmc 5 25 25 2 147028938 350.0 -73.15115097 0.05 overlap_cutoff 0.5 full_energy region mc #MC is frequent otherwise the convergence might be slow

run             500000

Please always report which LAMMPS version you are using and what platform you are running on.

These are signs for memory corruption. There are two main causes for that:

  1. one or more bugs in the code
  2. faulty hardware (RAM or CPU) or insufficient cooling

To minimize the risk of a bug in the code, you should use the latest available LAMMPS version, this is currently version 23 June 2022.

To check for failing hardware you can try some independent testing tool like memtest86.

If the issue persists and cannot be attributed to weak hardware or insufficient cooling, then you should try to set up as small a system as possible that can reproduce it with as

xyz format dump files are usually a bad choice because they contain only limited information

It may still be advisable to use either “m m m” boundaries or apply reflective or soft repulsive wall fixes to avoid lost atoms.

Then you are advised to seek out some experienced collaborator(s). There are many subtle issues with MD in general and ReaxFF in particular and you cannot learn about those effectively from textbooks, published literature, or online forums.

Thank you for your answer,

The version I am using is a precompiled version (29 Sep 2021) from the supercomputer I use (Red Hat Enterprise Linux, Intel Classic C++ 20.21.2 / Intel(R) C++ g++ 4.8.5 mode with OpenMP 5.0 pr
eview 1). I’ve never had any issue of this sort before.

I compiled my own version (23 Jun 2022) on another supercomputer (SUSE Linux Enterprise Server, GNU C++ 10.2.0 20200723 (Cray Inc.) with OpenMP 4.5) with the help of the HPC support and was able to reproduce the issue.

Although I am not able to reproduce the issue so far on my laptop (Apple M1, ARM64, Homebrew version)

That’s what I thought but to be fair, given my little experience with MD I also thought that me doing a mistake is more likely than a bug in the code, that’s why I posted my input here to have it checked by experts.

Couldn’t it be because the system is small and unbalanced? (nanoparticle at the center of a box), some processors might have much more atoms than other?

For now I am trying to navigate with my (average) knowledge of statistical mechanics and common sense. I am trying to get knowledgeable collaborators without success so far.

What would be important is to have a stack trace - either from valgrind or from gdb - to locate in which part of the code the memory corruption happens.

Unlikely. How many processors do you use? Given the small size of the system, there is not much to gain from using many MPI ranks. It may be beneficial to use multi-threading instead. If there is memory corruption due to overparallelization, then it would be in the bond-order code of ReaxFF, and that can be adjusted with settings on the command line (safezone, mincap, minhbonds) and the risk minimized by compiling/using the (serial) KOKKOS version of the pair style.

The fact that you are using ReaxFF is a significant complication. Perhaps you should first try a simpler system, e.g. using EAM for a metal nanoparticle and some other element to deposit on it that can be modeled by lj/cut.

With the nanoparticle in the center of the box, you should be able to use 8 MPI ranks without having to worry about load imbalance and MPI ranks without atoms (further parallelization could then be obtained by using MPI+OpenMP). It would also be preferable to start from a system already (over)populated with oxygen atoms. For the memory management issues I mentioned, it is always better to remove atoms than to add. The number of atoms or pairs of atoms may only increase by a certain amount with the ReaxFF implementation in LAMMPS which assumes that the number of atoms or number of interactions does not change much.

I use 4 MPI processes, I tried multi-threading (2 Open-MP threads) which leads to a slight speed-up

I will have a look at that

Thank you for the advice

Me and my predecessor did try some Gupta potentials before with relative success (without oxygen also). This ReaxFF forcefield reproduces adsorption energies (and some energy barriers) with an accuracy that we cannot hope to get with simpler potentials. Without entering too much into details, the nanoparticle we use are “realistic” (non-perfect) shapes from experiments, with quite a lot of steps, kinks and ad-atoms, We found that simpler potentials are less accurate with these shapes sometimes leading to weird geometries. We hope that we can feed some of these geometries into higher level theories (ab-initio).

I didn’t mean to use those for research but as a means to build your skills. Both using fix gcmc and using ReaxFF are advanced simulation techniques and thus require more experience, care and practice. If you experiment with a system where you leave out one of the complications until you are more comfortable and confident in your choice of settings, you make it much easier to acquire the skills, to tell if something is wrong, and correct for it.

Update on this issue:

After updating the size of the box to something bigger and changing the periodic boundary conditions as you advised, the simulation seems to be more stable (it doesn’t crash)

However, at some point I start to see some oxygen too close to each other (~ 0.4 Å) (please see the image attached, top right and bottom left). During the simulation one of the oxygen clearly stop being updated (it is never selected for MC translation, it doesn’t move anymore) and after a few timesteps another oxygen appears on top of it. It seems that these oxygens are “forgotten” by the simulation.

I asked to dump the processor associated with each atom in the dump file. Each time these phenomena occur (stop being updated), it is when the oxygen atom migrates processors, something is going on with this.

The problem disappears (it seems) when I lower the mc translation from 0.05 to 0.01 Å. I think this doesn’t fix the underlying cause but it makes the conditions for the problem to occur more unlikely.

Perhaps @athomps has some suggestions for you.

This sounds like a common problem with MC sampling, it is capable of find unphysical low-energy regions of configuration space that are inaccessible to thermal dynamics, but may be accessible by other sampling methods. You could suppress this behavior by adding a repulsive core, or you can be more conservative in your MC sampling.

Right, but here it seems that one of the oxygen is not updated by MC anymore (coincidently when it changed processor):

It seems that the simulation does not “see” this oxygen anymore (it is not selected by any MC move for the rest of the simulation). Thus this might explain why another oxygen is added later on at this location.

If you tell me that this is normal, I believe you, I just want to make sure the problem is well understood.

I guess we don’t have enough information to know if this is an unphysical attraction or something else. You can verify that it is something else by writing out the configuration with overlapping oxygens, and reading it back in. If unphysical attraction is the culprit, the energy will be unchanged.

If you send me a “small as possible” example of fix gcmc + pair_style reaxff questionable behavior, I can take a look.

I tried a few things:

  1. I restarted with read_restart: oxygens stays the same (overlapping), one is still not moving. The simulation run the same, which makes sense.

  2. I extracted the “bad” geometry and started from scratch, same input file but different starting geometry:

    • If I do a minimisation the two oxygen atoms instantly stop overlapping.
    • If I run a singlepoint calculation on this snapshot the potential energy is higher than the same snapshot but during the gcmc simulation. -55912.221 kcal/mol vs -56738.642 kcal/mol
    • If I manually remove the two oxygen atoms and run a singlepoint calculation the potential energy is closer -56852.551 kcal/mol vs -56738.642 kcal/mol

Also, this might be related: add 'overlap' keyword to fix gcmc · Issue #409 · lammps/lammps · GitHub

As I sometimes see the warning “WARNING: Energy of old configuration in fix gcmc is > MAXENERGYTEST. (src/MC/fix_gcmc.cpp:738)” followed by a huge drop of the atom count

Please send a small example.

Thanks for the example and for noticing this issue. There was indeed a bug when translating single atoms in parallel with full energy. The fix is yet another full energy calculation. I will need to do a little more checking to see if the same thing occurs elsewhere, but for now here is the patch in src/MC/fix_gcmc.cpp FixGCMC::attempt_atomic_translation_full():

-    energy_stored = energy_before;
+    // compute old energy, to ensure that atom is moved back to old proc
+    energy_stored = energy_full();