I am trying to run Monte Carlo simulations using reaxff to study the oxygen coverage on a Pt nanoparticle. I defined my oxygen reservoir using the chemical potential definition for a perfect gas. Initially, the system is very small (350 atoms) and placed in a 40x40x40 box.
I want to explore the phase space of the Pt nanoparticle at a given temperature so I also use the “fix nvt” command.
At first, everything seems to go well: the temperature of the system is well equilibrated. But after a while, the dump file (.xyz) becomes unreadable because of bad printing: lines with strange characters, ghost atoms (duplicates sometimes), I even saw the addition of a platinum atom. It also seems that the average temperature shifts slightly down (3~5 K).
Sometimes the simulation will crash with “segmentation fault” or just go on.
I am not an expert in MD simulations nor is my group or my supervisor, I did my best to run something coherent, but knowing the complexity of reaxff forcefield the problem surely comes from me.
Any comment will be appreciated.
input: (I run this using 4 MPI processes)
boundary f f f # Periodicity doesn't really matter
units real
atom_style charge
read_data mc.lmp_data
pair_style reaxff NULL
pair_coeff * * ./ffield.reax.pt_o_h Pt O
fix q_equi all qeq/reax 1 0.0 10.0 1e-6 reax/c #Charge equilibration for reaxff
thermo_style custom step temp press atoms density etotal pe ke
thermo 5
timestep 0.25
neighbor 2.0 bin #Default for units real
neigh_modify every 5 delay 0 check yes #We build every time MC is called
minimize 1.0e-8 1.0e-9 100000 1000000 #Relax the nanoparticle
region mc_in sphere 20 20 20 16 side in
region mc_out sphere 20 20 20 6 side out
region mc intersect 2 mc_in mc_out #We limit the MC volume to the nanoparticle surface
restart 25000 gcmc.*.restart
group gas type 2 #Oxygen
group nano type 1 #Pt
velocity nano create 350.0 335679138 dist gaussian #The pt nanoparticle will be sampled at 350K
compute_modify thermo_temp dynamic yes #Necessary if the number of atom changes
compute nano_temp nano temp #set NVT and uVT to the same temp
compute_modify nano_temp dynamic/dof yes
fix nanofix nano nvt temp 350.0 350.0 $(100.0*dt) # NVT for the nanoparticle
fix_modify nanofix temp nano_temp
dump gcmc_equi all xyz 50 lmp_gcmc.xyz
dump_modify gcmc_equi element Pt O
run 40000 #Small NVT equilibration for the nanoparticle
fix gcmc_equi gas gcmc 5 25 25 2 147028938 350.0 -73.15115097 0.05 overlap_cutoff 0.5 full_energy region mc #MC is frequent otherwise the convergence might be slow
run 500000
Please always report which LAMMPS version you are using and what platform you are running on.
These are signs for memory corruption. There are two main causes for that:
one or more bugs in the code
faulty hardware (RAM or CPU) or insufficient cooling
To minimize the risk of a bug in the code, you should use the latest available LAMMPS version, this is currently version 23 June 2022.
To check for failing hardware you can try some independent testing tool like memtest86.
If the issue persists and cannot be attributed to weak hardware or insufficient cooling, then you should try to set up as small a system as possible that can reproduce it with as
xyz format dump files are usually a bad choice because they contain only limited information
It may still be advisable to use either “m m m” boundaries or apply reflective or soft repulsive wall fixes to avoid lost atoms.
Then you are advised to seek out some experienced collaborator(s). There are many subtle issues with MD in general and ReaxFF in particular and you cannot learn about those effectively from textbooks, published literature, or online forums.
The version I am using is a precompiled version (29 Sep 2021) from the supercomputer I use (Red Hat Enterprise Linux, Intel Classic C++ 20.21.2 / Intel(R) C++ g++ 4.8.5 mode with OpenMP 5.0 pr
eview 1). I’ve never had any issue of this sort before.
I compiled my own version (23 Jun 2022) on another supercomputer (SUSE Linux Enterprise Server, GNU C++ 10.2.0 20200723 (Cray Inc.) with OpenMP 4.5) with the help of the HPC support and was able to reproduce the issue.
Although I am not able to reproduce the issue so far on my laptop (Apple M1, ARM64, Homebrew version)
That’s what I thought but to be fair, given my little experience with MD I also thought that me doing a mistake is more likely than a bug in the code, that’s why I posted my input here to have it checked by experts.
Couldn’t it be because the system is small and unbalanced? (nanoparticle at the center of a box), some processors might have much more atoms than other?
For now I am trying to navigate with my (average) knowledge of statistical mechanics and common sense. I am trying to get knowledgeable collaborators without success so far.
What would be important is to have a stack trace - either from valgrind or from gdb - to locate in which part of the code the memory corruption happens.
Unlikely. How many processors do you use? Given the small size of the system, there is not much to gain from using many MPI ranks. It may be beneficial to use multi-threading instead. If there is memory corruption due to overparallelization, then it would be in the bond-order code of ReaxFF, and that can be adjusted with settings on the command line (safezone, mincap, minhbonds) and the risk minimized by compiling/using the (serial) KOKKOS version of the pair style.
The fact that you are using ReaxFF is a significant complication. Perhaps you should first try a simpler system, e.g. using EAM for a metal nanoparticle and some other element to deposit on it that can be modeled by lj/cut.
With the nanoparticle in the center of the box, you should be able to use 8 MPI ranks without having to worry about load imbalance and MPI ranks without atoms (further parallelization could then be obtained by using MPI+OpenMP). It would also be preferable to start from a system already (over)populated with oxygen atoms. For the memory management issues I mentioned, it is always better to remove atoms than to add. The number of atoms or pairs of atoms may only increase by a certain amount with the ReaxFF implementation in LAMMPS which assumes that the number of atoms or number of interactions does not change much.
I use 4 MPI processes, I tried multi-threading (2 Open-MP threads) which leads to a slight speed-up
I will have a look at that
Thank you for the advice
Me and my predecessor did try some Gupta potentials before with relative success (without oxygen also). This ReaxFF forcefield reproduces adsorption energies (and some energy barriers) with an accuracy that we cannot hope to get with simpler potentials. Without entering too much into details, the nanoparticle we use are “realistic” (non-perfect) shapes from experiments, with quite a lot of steps, kinks and ad-atoms, We found that simpler potentials are less accurate with these shapes sometimes leading to weird geometries. We hope that we can feed some of these geometries into higher level theories (ab-initio).
I didn’t mean to use those for research but as a means to build your skills. Both using fix gcmc and using ReaxFF are advanced simulation techniques and thus require more experience, care and practice. If you experiment with a system where you leave out one of the complications until you are more comfortable and confident in your choice of settings, you make it much easier to acquire the skills, to tell if something is wrong, and correct for it.
After updating the size of the box to something bigger and changing the periodic boundary conditions as you advised, the simulation seems to be more stable (it doesn’t crash)
However, at some point I start to see some oxygen too close to each other (~ 0.4 Å) (please see the image attached, top right and bottom left). During the simulation one of the oxygen clearly stop being updated (it is never selected for MC translation, it doesn’t move anymore) and after a few timesteps another oxygen appears on top of it. It seems that these oxygens are “forgotten” by the simulation.
I asked to dump the processor associated with each atom in the dump file. Each time these phenomena occur (stop being updated), it is when the oxygen atom migrates processors, something is going on with this.
The problem disappears (it seems) when I lower the mc translation from 0.05 to 0.01 Å. I think this doesn’t fix the underlying cause but it makes the conditions for the problem to occur more unlikely.
This sounds like a common problem with MC sampling, it is capable of find unphysical low-energy regions of configuration space that are inaccessible to thermal dynamics, but may be accessible by other sampling methods. You could suppress this behavior by adding a repulsive core, or you can be more conservative in your MC sampling.
Right, but here it seems that one of the oxygen is not updated by MC anymore (coincidently when it changed processor):
It seems that the simulation does not “see” this oxygen anymore (it is not selected by any MC move for the rest of the simulation). Thus this might explain why another oxygen is added later on at this location.
If you tell me that this is normal, I believe you, I just want to make sure the problem is well understood.
I guess we don’t have enough information to know if this is an unphysical attraction or something else. You can verify that it is something else by writing out the configuration with overlapping oxygens, and reading it back in. If unphysical attraction is the culprit, the energy will be unchanged.
I restarted with read_restart: oxygens stays the same (overlapping), one is still not moving. The simulation run the same, which makes sense.
I extracted the “bad” geometry and started from scratch, same input file but different starting geometry:
If I do a minimisation the two oxygen atoms instantly stop overlapping.
If I run a singlepoint calculation on this snapshot the potential energy is higher than the same snapshot but during the gcmc simulation. -55912.221 kcal/mol vs -56738.642 kcal/mol
If I manually remove the two oxygen atoms and run a singlepoint calculation the potential energy is closer -56852.551 kcal/mol vs -56738.642 kcal/mol
As I sometimes see the warning “WARNING: Energy of old configuration in fix gcmc is > MAXENERGYTEST. (src/MC/fix_gcmc.cpp:738)” followed by a huge drop of the atom count
Thanks for the example and for noticing this issue. There was indeed a bug when translating single atoms in parallel with full energy. The fix is yet another full energy calculation. I will need to do a little more checking to see if the same thing occurs elsewhere, but for now here is the patch in src/MC/fix_gcmc.cpp FixGCMC::attempt_atomic_translation_full():
- energy_stored = energy_before;
+
+ // compute old energy, to ensure that atom is moved back to old proc
+
+ energy_stored = energy_full();