Problem regarding fix atom/swap

Hi all,

I am currently running a Hybrid MD/MC simulation to observe the precipitation sequence and evolution of an Al-Mg-Mn-Sc-Zr alloy during isothermal heat treatment at 573 K. The initial configuration is derived from a Laser Powder Bed Fusion (LPBF) simulation.

System and Resources:

  • LAMMPS Version: 29 Aug 2024

  • Potential: DeepMD (Machine Learning Potential) combined with ZBL.

  • System Size: ~850,000 atoms total (approx. 370,000 atoms involved in the swap group).

  • Hardware: 1 Node with 8x Tesla V100 GPUs (running via Singularity container).

  • Parallelism: 8 MPI tasks (1 per GPU) with 2 OpenMP threads.

The Challenge:

Since DeepMD inference is computationally expensive, the MC swap steps (which require energy re-calculation) significantly slow down the simulation.

  • Pure MD speed: ~0.21 ns/day.

  • Hybrid MD/MC speed: ~0.117 ns/day (approx. 50% slowdown).

To maintain a manageable simulation speed, I have set the fix atom/swap parameters to a relatively low frequency and low number of attempts:

  • N (invoke frequency) = 100 steps (0.1 ps)

  • X (max attempts) = 5 to 10 attempts per fix command.

  • Total swap attempts per 0.1 ps: 45 (sum across all 6 fix commands).

My Questions:

  1. Physical Validity: Given the extremely slow diffusion in solids, will such a low swap frequency (attempting only ~0.01% of atoms every 0.1 ps) be sufficient to overcome the time-scale limitations and effectively simulate precipitation/clustering evolution? Or is this flux too low to observe meaningful microstructural changes within a few nanoseconds of simulation time?

  2. Efficiency Strategy: Are there specific strategies for optimizing fix atom/swap when using expensive ML potentials like DeepMD? Is it better to perform fewer swaps more frequently (e.g., N=10, X=1) or bulk swaps less frequently (e.g., N=1000, X=100) to minimize GPU communication/invocation overhead?

Below are my input script and relevant log outputs.

Input Script :

# 1. Initialization
units           metal
boundary        p p p
atom_style      atomic
neighbor        2 bin
neigh_modify    every 1 delay 0 check yes
timestep        0.001
read_data       test.data 

# 2. Force Field (DeepMD + ZBL)
pair_style      hybrid/overlay deepmd model-compress.pb zbl 1.9 2.4
pair_coeff      * * deepmd Al Mg Mn Sc Zr
pair_coeff      * * zbl 0 0
pair_coeff      1 1 zbl 13 13
pair_coeff      1 2 zbl 13 12
pair_coeff      1 3 zbl 13 25
pair_coeff      1 4 zbl 13 21
pair_coeff      1 5 zbl 13 40
pair_coeff      2 2 zbl 12 12
pair_coeff      2 3 zbl 12 25
pair_coeff      2 4 zbl 12 21
pair_coeff      2 5 zbl 12 40
pair_coeff      3 3 zbl 25 25
pair_coeff      3 4 zbl 25 21
pair_coeff      3 5 zbl 25 40
pair_coeff      4 4 zbl 21 21
pair_coeff      4 5 zbl 21 40
pair_coeff      5 5 zbl 40 40

# 3. Group definitions
region          powderbed block INF INF INF INF 32 INF units box
group           powderbed region powderbed

fix             1 all npt temp 573 573 0.1 x 0 0 1.0 y 0 0 1.01

# 4. MC Swap Settings
# Al-Mn
fix mc_Mn   powderbed atom/swap 100 10 12342 573 types 1 3 ke yes
# Al-Sc
fix mc_Sc   powderbed atom/swap 100 10 12343 573 types 1 4 ke yes
# Al-Zr
fix mc_Zr   powderbed atom/swap 100 10 12344 573 types 1 5 ke yes
# Sc-Zr (Core-shell competition)
fix mc_ScZr powderbed atom/swap 100 5 12349 573 types 4 5 ke yes
# Mg-related
fix mc_Mg   powderbed atom/swap 100 5 12341 573 types 1 2 ke yes
fix mc_MgMn powderbed atom/swap 100 5 12347 573 types 2 3 ke yes

# 5. Output
thermo          1
# Output acceptance counts for monitoring
thermo_style    custom step temp pe press vol &
                f_mc_Mn[1]   f_mc_Mn[2]   &
                f_mc_Sc[1]   f_mc_Sc[2]   &
                f_mc_Zr[1]   f_mc_Zr[2]   &
                f_mc_ScZr[1] f_mc_ScZr[2] &
                f_mc_Mg[1]   f_mc_Mg[2]   &
                f_mc_MgMn[1] f_mc_MgMn[2]

dump            1 all custom 100 test.xyz id type x y z vx vy vz 

run             1000

Log Output (Excerpt):

Per MPI rank memory allocation (min/avg/max) = 69.73 | 91.72 | 117.2 Mbytes
   Step          Temp          PotEng          Press          Volume        f_mc_Mn[1]     f_mc_Mn[2]     f_mc_Sc[1]     f_mc_Sc[2]     f_mc_Zr[1]     f_mc_Zr[2]     f_mc_ScZr[1]   f_mc_ScZr[2]    f_mc_Mg[1]     f_mc_Mg[2]    f_mc_MgMn[1]   f_mc_MgMn[2] 
         0   521.30925     -1.3931766e+08  102.28221      51336049       0              0              0              0              0              0              0              0              0              0              0              0            
      1000   567.19876     -1.3931208e+08  2.6469089      51282428       100            7              100            7              100            8              50             10             50             0              50             1            
Loop time of 740.609 on 16 procs for 1000 steps with 856260 atoms

Performance: 0.117 ns/day, 205.725 hours/ns, 1.350 timesteps/s, 1.156 Matom-step/s
80.9% CPU use with 8 MPI tasks x 2 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 185.55     | 247.57     | 313.65     | 367.9 | 33.43
Modify  | 337.53     | 354.47     | 359.76     |  34.8 | 47.86  <-- MC Swap overhead
Other   |            | 19.57      |            |       |  2.64

Job Submission Script (Slurm/Singularity):

#SBATCH --gres=gpu:8
# ...
singularity exec --nv -e -B /public:/public .../deepmd-kit_3.0.0rc0_cuda118.sif bash << EOF
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export TF_FORCE_GPU_ALLOW_GROWTH=true
export TF_NUM_INTEROP_THREADS=1
export TF_NUM_INTRAOP_THREADS=1
export OMP_NUM_THREADS=2
# ...
mpirun -np 8 lmp -in test.in
EOF

Any insights or suggestions would be greatly appreciated. Thank you in advance!

Hi @fyl18580063836,

It’s nearly impossible to conclude given the complexity of your system and potential. Also hybrid MD/MC simulation are rather an advanced technique so it is hard to predict what to expect from such simulations.

A common measure used in MC simulation is to look at the ratio of accepted move given the temperature to have an idea on the energy barrier associated with each move. Note also that your system is already huge and you might better start working on smaller system to understand what is happening before launching production simulation on large systems.

2 Likes

@fyl18580063836
In addition to experimenting with a smaller systems, I would first also use a far less expensive potential (EAM or MEAM comes to mind, MEAM in particular since it has the option to spline in a ZBL potential properly instead of just adding it like you do, which may add a significant error to your model) to explore your options and what you can do to get the information that you are looking for. In fact, I would not be surprised if you can already extract most of what you are looking for directly without having to use the ML potential (especially considering your augmenting it by adding ZBL).

3 Likes

Hi @Germain

Thank you for the insights.

I checked my logs as you suggested. At 573 K, the acceptance ratio for the primary precipitating pairs (Al-Sc, Al-Zr, and Al-Mn) is consistently around 7% – 8%.

Does this range indicate a healthy sampling efficiency that overcomes energy barriers without introducing unphysical disorder?

Regarding the system size, I realized I can carve out a smaller representative volume (ensuring sufficient solute content) to benchmark the N and X parameters efficiently. I will proceed with this optimization on the smaller system first before scaling back up.

Thanks again.

Hi @akohlmey
Thank you very much for your insightful comments.

Regarding the choice of potential:

I completely agree that using an EAM or MEAM potential would be orders of magnitude more efficient. In fact, that would have been my first choice as well. However, the major bottleneck I faced is the lack of reliable classical potential parameters for this specific quinary alloy system (Al-Mg-Mn-Sc-Zr). Developing a robust EAM or MEAM potential from scratch that satisfies the high thermodynamic accuracy required in such a complex multicomponent phase space was exceedingly difficult. Consequently, I opted to train and use a DeepMD model to ensure the chemical accuracy of the system.

Regarding the ZBL augmentation via hybrid/overlay:

I appreciate you pointing out the risk of introducing errors. I included the ZBL potential primarily because my DeepMD model was not trained on configurations with extremely short interatomic distances. To prevent non-physical atomic overlaps, I overlaid the ZBL potential.

  • I have set the ZBL cutoff to 2.4 Å.
  • Since the nearest neighbor distance in the Al matrix is ~2.86 Å and the current simulation is at 573 K, the interatomic distances rarely fall into the ZBL range.

I am hoping that this setup effectively minimizes any perturbation to the DeepMD potential. Could you kindly advise if you consider this safety margin sufficient to avoid the accuracy issues you mentioned?

Thank you again for your valuable time.

In principle, if your force field is “local”, then the MC trial of a single atom swap can be performed very fast, since the energy change can be fully determined from the local configuration around swapped atoms (which involves only O(1) atoms instead of O(N)). I don’t know if it applies to the force field you are using. Anyway, even if this optimization is possible, you need to write your own code for that (LAMMPS does not do this), which is not a trivial task if you have not done similar things before. And your simulation is still bottlenecked by the MD steps. So I would say using a more efficient force field is the right way to go.

ZBL uses two cutoffs where you switch the potential to zero from the inner to the outer. Your reasoning for where to put those cutoffs has to be based on how you trained your ML potential. It would have to be fully reliable while you switch on the ZBL potential. However, since you don’t switch off the ML potential at the same time, you don’t know what it will “hallucinate” as interactions beyond that point. You seem to assume that the interactions will be weak compared to ZBL, but do you have any proof for that? If not, you can have rather bogus results despite adding ZBL.

At any rate, the burden of proof is on you, and what an outsider like me thinks about it is no proof or justification.

1 Like