invoking GPU with windows version of LAMMPS

Is this the proper command to invoke the utilization of my K5200 GPU for use with LAMMPS?

mpiexec -localonly 8 lmp_mpi -sf gpu -pk gpu 1 -in in.reaxc.rdx > rdx.out

Where I have modified in.reaxc.rdx to include the newton on command.

Thanks for any responses

I am using LAMMPS (15 May 2015-ICMS)

Jim Kress

The output of ocl_get_devices is:

C:\Program Files\LAMMPS 64-bit 20150616\bin>ocl_get_devices

Found 1 platform(s).

Using platform: NVIDIA Corporation NVIDIA CUDA OpenCL 1.2 CUDA 7.5.9

Device 0: “Quadro K5200”

Type of device: GPU

Double precision support: Yes

Total amount of global memory: 8 GB

Number of compute units/multiprocessors: 12

Total amount of constant memory: 65536 bytes

Total amount of local/shared memory per block: 49152 bytes

Maximum group size (# of threads per block) 1024

Maximum item sizes (# threads for each dim) 1024 x 1024 x 64

Clock rate: 0.771 GHz

ECC support: No

Device fission into equal partitions: No

Device fission by counts: No

Device fission by affinity: No

Maximum subdevices from fission: 1

Device 1: “Quadro 4000”

Type of device: GPU

Double precision support: Yes

Total amount of global memory: 2 GB

Number of compute units/multiprocessors: 8

Total amount of constant memory: 65536 bytes

Total amount of local/shared memory per block: 49152 bytes

Maximum group size (# of threads per block) 1024

Maximum item sizes (# threads for each dim) 1024 x 1024 x 64

Clock rate: 0.95 GHz

ECC support: No

Device fission into equal partitions: No

Device fission by counts: No

Device fission by affinity: No

Maximum subdevices from fission: 1

And the content of log.lammps generated by said command is:

LAMMPS (15 May 2015-ICMS)

WARNING: OMP_NUM_THREADS environment is not set. (…/comm.cpp:89)

using 1 OpenMP thread(s) per MPI task

package gpu 1

package gpu 1

ReaxFF potential for RDX system

this run is equivalent to reax/in.reax.rdx

units real

newton on

atom_style charge

read_data data.rdx

orthogonal box = (35 35 35) to (48 48 48)

2 by 2 by 2 MPI processor grid

reading atoms …

21 atoms

pair_style reax/c control.reax_c.rdx

pair_coeff * * ffield.reax C H O N

compute reax all pair reax/c

variable eb equal c_reax[1]

variable ea equal c_reax[2]

variable elp equal c_reax[3]

variable emol equal c_reax[4]

variable ev equal c_reax[5]

variable epen equal c_reax[6]

variable ecoa equal c_reax[7]

variable ehb equal c_reax[8]

variable et equal c_reax[9]

variable eco equal c_reax[10]

variable ew equal c_reax[11]

variable ep equal c_reax[12]

variable efi equal c_reax[13]

variable eqeq equal c_reax[14]

neighbor 2.5 bin

neigh_modify every 10 delay 0 check no

fix 1 all nve

fix 2 all qeq/reax 1 0.0 10.0 1.0e-6 reax/c

thermo 10

thermo_style custom step temp epair etotal press v_eb v_ea v_elp v_emol v_ev v_epen v_ecoa v_ehb v_et v_eco v_ew v_ep v_efi v_eqeq

timestep 1.0

#dump 1 all atom 10 dump.reaxc.rdx

#dump 2 all image 25 image.*.jpg type type # axes yes 0.8 0.02 view 60 -30

#dump_modify 2 pad 3

#dump 3 all movie 25 movie.mpg type type # axes yes 0.8 0.02 view 60 -30

#dump_modify 3 pad 3

run 100

Neighbor list info …

2 neighbor list requests

update every 10 steps, delay 0 steps, check no

master list distance cutoff = 12.5

Memory usage per processor = 11.3935 Mbytes

Step Temp E_pair TotEng Press eb ea elp emol ev epen ecoa ehb et eco ew ep efi eqeq

0 0 -1884.3081 -1884.3081 27186.178 -2958.4712 79.527715 0.31082031 0 98.589783 25.846176 -0.18034154 0 16.709078 -9.1620736 938.43732 -244.79981 0 168.88445

10 1288.6115 -1989.6644 -1912.8422 -19456.352 -2734.6769 -15.607219 0.20177961 0 54.629557 3.125229 -77.7067 0 14.933901 -5.8108542 843.92074 -180.43321 0 107.75934

20 538.95844 -1942.7037 -1910.5731 -10725.661 -2803.7395 7.9078326 0.077926683 0 81.610046 0.22951932 -57.557102 0 30.331203 -10.178049 878.99015 -159.69247 0 89.316704

30 463.09527 -1933.5765 -1905.9685 -33255.508 -2749.8591 -8.015461 0.027628739 0 81.627406 0.11972398 -50.26228 0 20.82032 -9.632703 851.8872 -149.49539 0 79.206121

40 885.49546 -1958.9125 -1906.1227 -4814.6602 -2795.644 9.1506113 0.13747487 0 70.948056 0.24360554 -57.862694 0 19.076515 -11.141211 873.73892 -159.99391 0 92.434067

50 861.1612 -1954.4601 -1903.121 -1896.6972 -2784.8449 3.8269556 0.15793303 0 79.851646 3.3492094 -78.06613 0 32.628941 -7.9565312 872.81848 -190.9857 0 114.75999

60 1167.7836 -1971.8435 -1902.2246 -3482.8401 -2705.864 -17.121532 0.22749081 0 44.507713 7.8560062 -74.789009 0 16.25651 -4.6046704 835.83079 -188.3369 0 114.19414

70 1439.9913 -1989.3025 -1903.4556 23845.778 -2890.7894 31.958717 0.26671721 0 85.758358 3.1804063 -71.002948 0 24.357193 -10.311288 905.86811 -175.38499 0 106.79672

80 502.39872 -1930.755 -1900.8039 -20356.345 -2703.8112 -18.662647 0.11286147 0 99.803603 2.0329517 -76.171319 0 19.236871 -6.2786547 826.47441 -166.03145 0 92.539593

90 749.08377 -1946.984 -1902.3264 17798.642 -2863.7584 42.068701 0.24338049 0 96.181649 0.96183585 -69.955518 0 24.615447 -11.582751 903.68869 -190.13824 0 120.69123

100 1109.6942 -1968.5879 -1902.4322 -4490.3571 -2755.8987 -7.1225982 0.21757676 0 61.805995 7.0826206 -75.645463 0 20.115343 -6.2372537 863.56466 -198.5695 0 122.09938

Loop time of 0.52758 on 8 procs for 100 steps with 21 atoms

91.3% CPU use with 8 MPI tasks x 1 OpenMP threads

Performance: 16.377 ns/day 1.465 hours/ns 189.545 timesteps/s

MPI task timings breakdown:

Section | min time | avg time | max time |%varavg| %total

Is this the proper command to invoke the utilization of my K5200 GPU for use
with LAMMPS?

mpiexec -localonly 8 lmp_mpi -sf gpu -pk gpu 1 -in in.reaxc.rdx > rdx.out

technically, yes. practically, no.
there is no /gpu version of the reax/c pair style, so it will make no
difference.

Where I have modified in.reaxc.rdx to include the newton on command.

which is pointless, since it is the default.

there is no /gpu version of the reax/c pair style, so it will make no difference.

So, there is no utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS or it just wasn't included in the Windows version download?

I had read the PuReMD version of reax had been implemented in LAMMPS. Was that incorrect?

Where I have modified in.reaxc.rdx to include the newton on command.
which is pointless, since it is the default.

According to the manual, the specification of -sf gpu turns newton off. I have observed this to be true, since, if I do not include newton on in the input file (and use the command line as given previously) , I get this error message:

ERROR: Pair style reax/c requires newton pair on (../pair_reax_c.cpp:357)

Thank you.

Jim Kress

there is no /gpu version of the reax/c pair style, so it will make no difference.

So, there is no utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS or it just wasn't included in the Windows version download?

i already answered that.

I had read the PuReMD version of reax had been implemented in LAMMPS. Was that incorrect?

no.

Where I have modified in.reaxc.rdx to include the newton on command.
which is pointless, since it is the default.

According to the manual, the specification of -sf gpu turns newton off. I have observed this to be true, since, if I do not include newton on in the input file (and use the command line as given previously) , I get this error message:

ERROR: Pair style reax/c requires newton pair on (../pair_reax_c.cpp:357)

all /gpu styles must be compatible with newton pair off.

I'm confused.

I asked a two part question:

> So, there is no utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS or it just wasn't included in the Windows version download?

It was stated "I already answered that". Answered what?

That there is no utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS?

Or, there is utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS, but it was not included in the Windows version download?

Also,

I asked:

I had read the PuReMD version of reax had been implemented in LAMMPS. Was that incorrect?

The reply was:

no.

That generates another set of questions:

Does that mean the PuReMD version of reax IS implemented in LAMMPS? Is it the gpu version?

But, the Windows version does not contain the gpu version of PuReMD?

If the Windows version does not contain the gpu version of PuReMD, how does one obtain the LAMMPS version that does contain it?

Thank you.

Jim Kress

James,

PuReMD code was included in (and adapted to) lammps while it was cpu-only. GPU capabilities in it are very recent.

Oleg

"James Kress" <[email protected]...> 30 июня 2015 г. 16:01:23 написал:

In it = in PuReMD.

Oleg Sergeev <[email protected]...> 30 июня 2015 г. 16:35:53 написал:

I'm confused.

I asked a two part question:

> So, there is no utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS or it just wasn't included in the Windows version download?

It was stated "I already answered that". Answered what?

That there is no utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS?

Or, there is utilization of NVidia cards of any kind associated with reax or reax/c in LAMMPS, but it was not included in the Windows version download?

which part of "there is no /gpu version of the reax/c pair style" did
you not understand?
please show me in the LAMMPS documentation where it says differently.

Oleg,

Thank you for your clear and comprehensible response.

Do you know if there is any plan to incorporate the GPU version of PuReMD
into LAMMPS?

Thanks again.

Jim Kress

Less incivility and more illumination would be appreciated. I am new to LAMMPS and am merely trying to understand what it can and cannot do. The manual is not clear, at least to my inexperienced eyes, as to whether or not a gpu version of the reaxc pair style exists.

Oleg's response provided the information I required.

Thank you.

Jim Kress

Jim,

as far as I know - unlikely. More probable is kokkos-enabled reax implementation in lammps (kokkos is Sandian library that - ideally - automatically utilises whatever hardware you have on your machine: GPU, Xeon phi, ...).

Oleg

"James Kress" <[email protected]...> 30 июня 2015 г. 16:56:47 написал:

Oleg,

Thank you for the information. Is there a time frame for this or is it just
on the to-do list of the LAMMPS developers?

Jim

It is mostly an informed guess. So - no particular timeframe, no guarantee it will be in lammps at all.

Oleg

"James Kress" <[email protected]...> 30 июня 2015 г. 17:10:19 написал:

I believe the GPU version of PureMD is called either g-PureMD or PureMD-GP. However it is not in LAMMPS. Questions to its developer Prof. M. Aktulga (hma .at. cse.msu.edu) if there is any plans to incorporate it in LAMMPS.

We are currently implementing the Kokkos version of ReaxFF in LAMMPS, but there is no specific time frame of its release.

Ray

spending a few mins with google (try searching for 'PureMD GPU' ) leads to:

Article

PuReMD-GPU: A reactive molecular dynamics simulation package for GPUs

Sudhir Kylasa
H. Metin Aktulga
A.Y. Grama

Journal of Computational Physics (Impact Factor: 2.49). 09/2014;
272:343–359. DOI: 10.1016/j.jcp.2014.04.035

ABSTRACT We present an efficient and highly accurate GP-GPU
implementation of our community code, PuReMD, for reactive molecular
dynamics simulations using the ReaxFF force field. PuReMD and its
incorporation into LAMMPS (Reax/C) is used by a large number of
research groups worldwide for simulating diverse systems ranging from
biomembranes to explosives (RDX) at atomistic level of detail. The
sub-femtosecond time-steps associated with ReaxFF strongly motivate
significant improvements to per-timestep simulation time through
effective use of GPUs. This paper presents, in detail, the design and
implementation of PuReMD-GPU, which enables ReaxFF simulations on
GPUs, as well as various performance optimization techniques we
developed to obtain high performance on state-of-the-art hardware.
Comprehensive experiments on model systems (bulk water and amorphous
silica) are presented to quantify the performance improvements
achieved by PuReMD-GPU and to verify its accuracy. In particular, our
experiments show up to 16× improvement in runtime compared to our
highly optimized CPU-only single-core ReaxFF implementation.
PuReMD-GPU is a unique production code, and is currently available on
request from the authors.

jim, you could have saved yourself a *lot* of grief, especially after
reading the last sentence.

Axel,

Thank you for the amicable response.

I am familiar with the paper you located. In my discussions with the authors it appears I was confused by their response. I thought the PuReMD-GPU version of the code had been incorporated in LAMMPS. Apparently I misunderstood (or made an ill-informed assumption).

OK. Well so much for that. I'll get their code (which, of course, has no Windows version) and see what I can do with it.

We've done some remarkable things with ReaxFF in the Biomolecular Space. I just need a faster version since the systems are so large.

Thanks for your help.

Jim

James Kress Ph.D., President
The KressWorks Foundation ©
An IRS Approved 501 (c)(3) Charitable, Nonprofit Organization
"Improving Lives One Atom At A Time" TM
(248) 605-8770

Learn More and Donate At:
http://www.kressworks.org

Confidentiality Notice | This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential or proprietary information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, immediately contact the sender by reply e-mail and destroy all copies of the original message.