[lammps-users] GPUs on multiple nodes:

Manish_Agarwal · January 27, 2011, 12:26am

Manish Agarwal
<[email protected]...>
Postdoctoral Research Scientist
Chemical Engineering
Columbia University
- - - - - - - - - - - - - - - - - - - - - - - - - - -

Axel,

Indeed, I do need to read the documentation again, however:

The first place I came across this is :
http://lammps.sandia.gov/workshops/Feb10/Mike_Brown/gpu_tut.pdf

Also given in lammps manual (attached: 36,37.pdf) Page number 25 (36 of 779)
and in pair_lj.html (attached) given in the doc of the tarball.

My question should have been : Is it possible to run GPU accelerated
LAMMPS on multiple nodes with a CPU each? If yes, how?

As usual, expecting miracles from you !
Thanks,
Manish Agarwal
<[email protected]...>
Postdoctoral Research Scientist
Chemical Engineering
Columbia University
- - - - - - - - - - - - - - - - - - - - - - - - - - -

pair_lj.html (12.2 KB)

36,37.pdf (28.8 KB)

akohlmey · January 27, 2011, 12:43am

manish,

Axel,

Indeed, I do need to read the documentation again, however:

yes!

The first place I came across this is :
http://lammps.sandia.gov/workshops/Feb10/Mike_Brown/gpu_tut.pdf

but that is almost a year old!!
the gpu code has been updated more than once since.

Also given in lammps manual (attached: 36,37.pdf) Page number 25 (36 of 779)
and in pair_lj.html (attached) given in the doc of the tarball.

please use an up-to-date version.

My question should have been : Is it possible to run GPU accelerated
LAMMPS on multiple nodes with a CPU each? If yes, how?

yes, it is possible and it can be quite fast:
http://sites.google.com/site/akohlmey/software/lammps-benchmarks

it is in the documentation. you must have read it,
since your single node input was in accord with the
current documentation, but conflicting the documentation
that you are quoting and attaching.

As usual, expecting miracles from you !

i am still working on inventing the "remote cluebat"

axel.

_Brown_W_Michael · January 27, 2011, 12:47am

GPU tutorial now too old.
pair_lj had some old text left in. this is fixed in current on-line docs.
Read running on GPUs in section_start to get current instructions

Paper on gpu methods here:
http://dx.doi.org/10.1016/j.cpc.2010.12.021

- mike

Manish_Agarwal · January 27, 2011, 1:58am

Dear Axel and Michael,

Following the latest documentation, available on the website, I have

newton off
pair_style lj/cut/gpu 2.5
fix 0 all gpu force 0 0 -1

while running (mpirun -np 16 ./lmp_openmpi < control), the GPU section
of the log :

akohlmey · January 27, 2011, 2:11am

Dear Axel and Michael,

Following the latest documentation, available on the website, I have

newton off
pair_style lj/cut/gpu 2.5
fix 0 all gpu force 0 0 -1

while running (mpirun -np 16 ./lmp_openmpi < control), the GPU section
of the log :
--------------------------------------------------------------------------
- Using GPGPU acceleration for lj/cut:
- with 8 procs per device.
--------------------------------------------------------------------------
GPU 0: Tesla C2050, 448 cores, 1.1/2.6 GB, 1.1 GHZ (Double Precision)
--------------------------------------------------------------------------

Initializing GPU and compiling on process 0...Done.
Initializing GPU 0 on core 0...Done.
Initializing GPU 0 on core 1...Done.
Initializing GPU 0 on core 2...Done.
Initializing GPU 0 on core 3...Done.
Initializing GPU 0 on core 4...Done.
Initializing GPU 0 on core 5...Done.
Initializing GPU 0 on core 6...Done.
Initializing GPU 0 on core 7...Done.

and the preceding CPU section is :
2 by 2 by 4 processor grid

Query: Does this still mean the second GPU on the second node is active?

yes. it says, that you are using 8 MPI tasks per GPU.
with 16 MPI tasks total, that corresponds to 2 GPUs.
16 / 8 = 2

i would recommend to also try:

mpirun -np 2 -npernode 1
mpirun -np 4 -npernode 2
mpirun -np 6 -npernode 3
....
and so on to determine which degree of
oversubscription of the GPU provides the
best performance.

axel.