Slow calculation speed with the meam potential function

Dear LAMMPS users,

I would like to ask about slow calculation speed of LAMMPS when using meam potential.
I am using different potential files to calculate ion implantation, tersoff/zbl for Fe-Cr and meam for Fe-N respectively. The command I used is fix deposit. The time is shown in the txt document and the meam potential is listed here.
FeN.meam (773 Bytes)
library.meam (423 Bytes)
log of time.txt (4.8 KB)

All the calculations are performed on Linux and gtx 3060 with Lammps (23 Jun 2022). The version of CUDA is 12.0. While Fe-Cr is calculated using GPU, Fe-N is calculated using KOKKOS. I have tried using GPU for Fe-N calculation before, but the calculation time is longer.

There is a lot of information missing to be able to make any assessment.

  • what are the input files and what exact command lines do you use in each case?
  • your log file shows that you are using 12 MPI processes, do they all use the same GPU?
  • what are the specific compilation settings, especially with the GPU package?
    The output of lmp -h can be very useful in that respect, especially the part that looks like this:
OS: Linux "Fedora Linux 38 (Thirty Eight)" 6.4.7-200.fc38.x86_64 x86_64

Compiler: GNU C++ 13.2.1 20230728 (Red Hat 13.2.1-1) with OpenMP 4.5
C++ standard: C++17
MPI v4.0: MPICH Version:	4.0.3
MPICH Release date:	Tue Nov  8 09:51:06 CST 2022
MPICH ABI:	14:3:2

Accelerator configuration:

KOKKOS package API: OpenMP Serial
KOKKOS package precision: double
OPENMP package API: OpenMP
OPENMP package precision: double

Active compile time flags:

-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint):   32-bit
sizeof(bigint):   64-bit

Available compression formats:

Extension: .gz     Command: gzip
Extension: .bz2    Command: bzip2
Extension: .zst    Command: zstd
Extension: .xz     Command: xz
Extension: .lzma   Command: xz
Extension: .lz4    Command: lz4


Installed packages:

AMOEBA ASPHERE AWPMD BOCS BODY BPM BROWNIAN CG-DNA CG-SPICA CLASS2 COLLOID 
COLVARS COMPRESS CORESHELL DIELECTRIC DIFFRACTION DIPOLE DPD-BASIC DPD-MESO 
DPD-REACT DPD-SMOOTH DRUDE EFF ELECTRODE EXTRA-COMPUTE EXTRA-DUMP EXTRA-FIX 
EXTRA-MOLECULE EXTRA-PAIR FEP GRANULAR INTERLAYER KOKKOS KSPACE LEPTON MACHDYN 
MANYBODY MC MEAM MESONT MISC ML-IAP ML-POD ML-SNAP MOFFF MOLECULE MPIIO OPENMP 
OPT ORIENT PERI PHONON PLUGIN POEMS PYTHON QEQ REACTION REAXFF REPLICA RIGID 
SHOCK SPH SPIN SRD TALLY UEF VORONOI YAFF 

Please note that you have a consumer GPU which has poor support for double precision floating point and thus will likely lead to improved performance with the GPU package, when it is compiled for mixed or single precision. The KOKKOS package only supports all double precision and thus can only fairly compared to the GPU package when that is configured for all double precision, too.

–what are the input files and what exact command lines do you use in each case?
–I’m so sorry. I don’t quite understand what you mean by input files. My calculations were carried out for a single crystal iron target containing 300000 atoms. The exact command lines is:
fix 1 all nve
fix 2 addatoms deposit 10 2 30000 12345 region sput near 1 vx -74 -74 vz -605 -605 units box

–your log file shows that you are using 12 MPI processes, do they all use the same GPU?
–My command line is:
Fe-Cr: mpirun -np 12 lmp_mpi -sf gpu -pk gpu 1 -in in.implant -package gpu 0 neigh no
Fe-N: mpirun -np 12 lmp_kokkos_cuda_mpi -k on g 1 -sf kk -in in.implant -pk kokkos newton on neigh half

–what are the specific compilation settings, especially with the GPU package?
–OS: Linux Ubuntu 22.04.2 5.19.0-50-generic #50-Ubuntu SMP PREEMPT_DYNAMIC UTC x86_64 x86_64 x86_64

Compiler: GNU ld (GNU Binutils for Ubuntu) 2.38) gcc 11.4.0
C++ standard: C++20
MPI v4.0: MPICH Version: 3.3.2
MPICH Release date: Tue Nov 12 21:23:16 CST 2019

Installed packages:
ASPHERE BOCS BODY BPM BROWNIAN CG-SPICA CLASS2 COLLOID CORESHELL DIELECTRIC
DIFFRACTION DIPOLE DPD-BASIC DPD-MESO
DPD-REACT DPD-SMOOTH DRUDE EFF
EXTRA-COMPUTE EXTRA-DUMP EXTRA-FIX EXTRA-PAIR FEP GPUGRANULAR INTEL INTERLAYER KOKKOS MANIFOLD MANYBODY MC
MEAM MESONT MGPT MISC ML-IAP ML-RANN ML-SNAP MOFFF MOLECULE OPENMP OPT ORIENT
PERI PHONON PLUGIN PTM QEQ QTB REACTION
REAXFF REPLICA RIGID SHOCK SMTBQ SPH
SPIN SRD TALLY UEF VORONOI YAFF

I’m sorry. I really don’t know how to obtain some data.Hope their absence will not affect the analysis.

GPU is compiled for mixed precision. But I do not know how to set the precision of KOKKOS package .

Looking forward to your reply.

Input file is in.implant in your case.

This is useless. It is important to see the entire file.

Not in this case.

The problem is that you are trying to use a feature that does not exist in your version of LAMMPS. It is too old. You need: Release Stable release 2 August 2023 · lammps/lammps · GitHub
So when you are requesting to run MEAM on the GPU, it is not done, but the CPU version is used.

You cannot. I already said that KOKKOS only supports double precision and thus your GPU is not very suitable for that due to its significantly reduced number of double precision units compared with the data center GPUs.

Thanks for your reply. I will try again by the new version.

The performance summary does not show any GPU usage statistics.

One basic piece of information you haven’t considered (or told us you have considered) is that tersoff/zbl has a GPU variant, as well as a Kokkos variant (as the manual says) while meam only has a Kokkos variant (as the manual again says). So you simply won’t be getting good speed for meam on your GPU.

From a scientific viewpoint, it seems very strange to compare results from two entirely different force fields, unless they are completely separate studies you’re running. Different force fields have different assumptions, approximations and underlying mathematical models, so it’s very difficult for you to know if any difference you see between both simulations is because of the materials’ difference, or simply the different force fields.

I made the comparison between different functions to demonstrate how slow the calculation speed under the meam potential function is for me. I think the slow speed is caused by the type of potential function. I have also seen that although the mean potential function is slow, it is not so slow.

The meam potential function have no GPU variant. Dose it means that GPU is not used in my calculations?