Problem running file with GPU

Hi everyone,

I hope you are fine.
I needed to use GCMC to study the adsorption. As I understand it, only one core can be used, which is very time-consuming.
Based on what I read in the Mail list, I decided to install LAMMPS via GPU to speed up the program.
I’m using Ubuntu 20.04 LTS, NVIDIA driver 450.57 for GK208B [GeForce GT 710], and CUDA 11.0.1.

I installed LAMMPS (21 Jul 2020) with Cmake and ./lmp -h gives:

Invalid MIT-MAGIC-COOKIE-1 key
Large-scale Atomic/Molecular Massively Parallel Simulator - 21 Jul 2020
Git info (master / patch_21Jul2020-30-g41535d8de)

Usage example: ./lmp -var t 300 -echo screen -in in.alloy

List of command line options supported by this LAMMPS executable:

-echo none/screen/log/both : echoing of input script (-e)
-help : print this help message (-h)
-in filename : read input from file, not stdin (-i)
-kokkos on/off … : turn KOKKOS mode on or off (-k)
-log none/filename : where to send log output (-l)
-mpicolor color : which exe in a multi-exe mpirun cmd (-m)
-nocite : disable writing log.cite file (-nc)
-package style … : invoke package command (-pk)
-partition size1 size2 … : assign partition sizes (-p)
-plog basename : basename for partition logs (-pl)
-pscreen basename : basename for partition screens (-ps)
-restart2data rfile dfile … : convert restart to data file (-r2data)
-restart2dump rfile dgroup dstyle dfile …
: convert restart to dump file (-r2dump)
-reorder topology-specs : processor reordering (-r)
-screen none/filename : where to send screen output (-sc)
-suffix gpu/intel/opt/omp : style suffix to apply (-sf)
-var varname value : set index style variable (-v)

OS: Linux 5.4.0-42-generic on x86_64

Compiler: GNU C++ 9.3.0 with OpenMP 4.5
C++ standard: C++11
MPI v3.1: Open MPI v4.0.3, package: Debian OpenMPI, ident: 4.0.3, repo rev: v4.0
.3, Mar 03, 2020

Active compile time flags:

-DLAMMPS_GZIP
-DLAMMPS_PNG
-DLAMMPS_JPEG
-DLAMMPS_FFMPEG
-DLAMMPS_SMALLBIG
sizeof(smallint): 32-bit
sizeof(imageint): 32-bit
sizeof(tagint): 32-bit
sizeof(bigint): 64-bit

Installed packages:

ASPHERE BODY COLLOID CORESHELL DIPOLE GPU KOKKOS KSPACE MANYBODY MC MISC MOLECULE OPT RIGID USER-DRUDE USER-MEAMC USER-OMP

I ran a file containing the NPT ensemble once through the CPU and again through the GPU. The result was as follows:
Using CPU took 7 s while using GPU took 54s. (Which is very strange!)

I also ran the GCMC file that was placed in the GitHub by Mr. Axel Kohlmeyer. In both cases of using GPU and CPU, only one core used and the running speed of the program is as follows:
Using CPU took 7 s while using GPU took 17s. (Related files are attached.)

The following error appears when I use more than one core:

Invalid MIT-MAGIC-COOKIE-1 keyLAMMPS (21 Jul 2020)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:96)
using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 5.0000000 5.0000000 5.0000000
Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (10.000000 10.000000 10.000000)
1 by 2 by 2 MPI processor grid
Read molecule template co2mol:
1 molecules
3 atoms with max type 2
2 bonds with max type 1
1 angles with max type 1
0 dihedrals with max type 0
0 impropers with max type 0
Created 24 atoms
create_atoms CPU = 0.002 seconds
24 atoms in group co2
create bodies CPU = 0.000 seconds
8 rigid bodies with 24 atoms
1.1600000 = max distance from body owner to body atom
dynamic group carbon defined
dynamic group oxygen defined
Ewald initialization …
using 12-bit tables for long-range coulomb (src/kspace.cpp:330)
G vector (1/distance) = 0.23411209
estimated absolute RMS force accuracy = 0.033665616
estimated relative force accuracy = 0.000101383
KSpace vectors: actual max1d max3d = 16 2 62
kxmax kymax kzmax = 2 2 2

gcmc.rar (8.81 KB)

npt.rar (34 KB)

[... [

I ran a file containing the NPT ensemble once through the CPU and again through the GPU. The result was as follows:
Using CPU took 7 s while using GPU took 54s. (Which is very strange!)

this is not strange at all. you must be running a *tiny* system and
there is significant overhead in initializing a GPU.
the GPU you have is not a very capable (or recent) GPU to begin with,
so there is not a lot of performance gain to be expected even if you
run a more suitable input.

please note that GPU acceleration only has a performance advantage if
the system is sufficiently large. the individual GPU cores are less
compute power than a CPU core and the GPU kernels need to be
implemented in a way that is better parallelizable (compared to what
e.g. the USER-OMP package does), but that comes with the cost that you
need to do twice as many instructions. this only pays off when you
have a *huge* amount of work units and then you can take advantage of
the massively parallel GPU hardware (with many more cores than a CPU).

I also ran the GCMC file that was placed in the GitHub by Mr. Axel Kohlmeyer. In both cases of using GPU and CPU, only one core used and the running speed of the program is as follows:
Using CPU took 7 s while using GPU took 17s. (Related files are attached.)

The following error appears when I use more than one core:

that is a limitation of fix gcmc and has nothing to do with GPUs.
please see the error message from fix gcmc.
you cannot use MPI parallelization with the settings in that input deck.

axel.

Thank you for your answer.
So, in general, LAMMPS is not suitable for GCMC.
What method do you think I should use for adsorption?
Many thanks

Ali

Thank you for your answer.
So, in general, LAMMPS is not suitable for GCMC.

it is good for the scenarios it was designed to do. it is less good
for applications that "abuse" it.
LAMMPS is an MD code and not an MC code and it has been designed to do
MD efficiently in parallel for many applications and potentials.
parallelization through spatial decomposition and MC are a difficult
subject and so is effective treatment of long-range electrostatics.

What method do you think I should use for adsorption?

that is not my area of research. i suggest you survey the published
literature to find what methods and tools people are using to study
systems and processes that you are interested in.

axel.

I will definitely do this.
Thanks a lot.