Question about scaled rhodo

Hello
I see the following command for scaled rhodo run:

mpirun -np 16 lmp_mpi -var x 2 -var y 2 -var z 4 -in in.rhodo.scaled

On my Ryzen 7 CPU (8 cores, 16 threads) with one RTX 3080 (10GB memory), I run this command and it is fine:

lmp_mpi -var x 2 -var y 2 -var z 4 -sf gpu -in ~/lammps-2Aug2023/bench/in.rhodo.scaled
...
Replication is creating a 2x2x4 = 16 times larger system...
  orthogonal box = (-27.5 -38.5 -36.3646) to (82.5 115.5 254.5398)
  1 by 1 by 1 MPI processor grid
  512000 atoms
  443568 bonds
...

Note that I didn’t specify the number of processors in the mpi command. Now, if I run:

lmp_mpi -var x 3 -var y 2 -var z 4 -sf gpu -in ~/lammps-2Aug2023/bench/in.rhodo.scaled

it finishes with the following error:


Replication is creating a 3x2x4 = 24 times larger system...
  orthogonal box = (-27.5 -38.5 -36.3646) to (137.5 115.5 254.5398)
  1 by 1 by 1 MPI processor grid
  768000 atoms
...
Generated 2278 of 2278 mixed pair_coeff terms from arithmetic mixing rule
Setting up Verlet run ...
  Unit style    : real
  Current step  : 0
  Time step     : 2
Cuda driver error 1 in call at file 'geryon/nvd_kernel.h' in line 338.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

I am not sure if that is due to

  1. GPU memory shortage
    or
  2. 3x2x4 > 16 (threads)

As I check the nvidia-smi , just before the error the memory usage is about 3.7 GB, so, I don’t think that the error is because of memory shortage.

Any idea about that?

I think differently. I have a GPU with 6 GB RAM and can run a 3x2x2=12 times larger system, but not 3x2x3=18 times larger, so that you may not be able to run a 3x2x4=24 times system with 10GB sounds plausible. You can easily try to dial down by using a 3x2x3=18 times system. The large amount of memory use is due to the neighbor lists, which are the major consumer of RAM in classical MD.

Please note that in the GPU case you are running just with one MPI process and no threads. The -var flags only determine the number of replications of the system in x, y, and z, i.e. the size of the system.

With 3,2,2, I get the same error. OK I accept that it may be due to memory shortage.