How to use partitions with GPUs effectively?

Hello,

I am running LAMMPS (27 Jan 2013).

I would like to run multiple independent simulation trajectories simultaneously from the same input script using the partition command. I am also using the USER-CUDA package.

However, when I do so, I notice a significant increase in computation time (relative to running with no partition), leading me to believe that I am somehow not invoking the commands properly or in the best possible way. Without using the USER-CUDA package, this problem is not as evident…

Submitting without partitions (allocating 1 core):

Hello,

I am running LAMMPS (27 Jan 2013).

then you probably are due for an update. :wink:

I would like to run multiple independent simulation trajectories
simultaneously from the same input script using the partition command. I am
also using the USER-CUDA package.

However, when I do so, I notice a significant increase in computation time
(relative to running with no partition), leading me to believe that I am
somehow not invoking the commands properly or in the best possible way.
Without using the USER-CUDA package, this problem is not as evident...

the USER-CUDA package expects to have exclusive use of the GPU and is
tuned for that. attaching multiple jobs to the same GPU, will
negatively impact performance.

you may want to try out, whether the GPU package instead is a viable
option, since that is better suited to GPU oversubscription. but then
again, it also cannot make something from nothing, thus GPU sharing is
only useful, if the GPU code does not fully occupy the GPU.

axel.

Thanks for the prompt response.

Regarding the exclusive use of the GPU, one thing that I meant to note was that I am submitting my jobs to a node that havs 8 cores and 4 GPUs. Thus, I was expecting that each instance/simulation would have its own GPU. I am not sure if the USER-CUDA package is able to allocate these resources properly or if I can do that elsewhere… Do you know anything about this?

I will also try to build a newer version of lammps;-) with the GPU package as well.

Thanks for the prompt response.

Regarding the exclusive use of the GPU, one thing that I meant to note was
that I am submitting my jobs to a node that havs 8 cores and 4 GPUs. Thus, I
was expecting that each instance/simulation would have its own GPU. I am not

unlikely, unless you set things up manually. you can easily check with
nvidia-smi while the job(s) are running what the load of the
individual GPUs are.

sure if the USER-CUDA package is able to allocate these resources properly
or if I can do that elsewhere... Do you know anything about this?

check the documentation. i don't use the USER-CUDA package, since its
performance advantage lies in a region where i'm not running
simulations, but instead the GPU package is faster. in that case, you
are better off submitting individual jobs, or regular parallel jobs.

Sure-- the GPU package may well be better for my system sizes as well.

Using nvidia-smi as you suggested I was able to see that, indeed, only one of the GPUs was being utilized.

If I then alter the input script to do

package cuda gpu/node 4

The load-balancing is much better, and I recover good performance:

Submitting with 4x1 partitions (allocating 4 cores):