Is this the proper command to invoke the utilization of my K5200 GPU for use with LAMMPS?
mpiexec -localonly 8 lmp_mpi -sf gpu -pk gpu 1 -in in.reaxc.rdx > rdx.out
Where I have modified in.reaxc.rdx to include the newton on command.
Thanks for any responses
I am using LAMMPS (15 May 2015-ICMS)
Jim Kress
The output of ocl_get_devices is:
C:\Program Files\LAMMPS 64-bit 20150616\bin>ocl_get_devices
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA OpenCL 1.2 CUDA 7.5.9
Device 0: “Quadro K5200”
Type of device: GPU
Double precision support: Yes
Total amount of global memory: 8 GB
Number of compute units/multiprocessors: 12
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Maximum group size (# of threads per block) 1024
Maximum item sizes (# threads for each dim) 1024 x 1024 x 64
Clock rate: 0.771 GHz
ECC support: No
Device fission into equal partitions: No
Device fission by counts: No
Device fission by affinity: No
Maximum subdevices from fission: 1
Device 1: “Quadro 4000”
Type of device: GPU
Double precision support: Yes
Total amount of global memory: 2 GB
Number of compute units/multiprocessors: 8
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Maximum group size (# of threads per block) 1024
Maximum item sizes (# threads for each dim) 1024 x 1024 x 64
Clock rate: 0.95 GHz
ECC support: No
Device fission into equal partitions: No
Device fission by counts: No
Device fission by affinity: No
Maximum subdevices from fission: 1
And the content of log.lammps generated by said command is:
LAMMPS (15 May 2015-ICMS)
WARNING: OMP_NUM_THREADS environment is not set. (…/comm.cpp:89)
using 1 OpenMP thread(s) per MPI task
package gpu 1
package gpu 1
ReaxFF potential for RDX system
this run is equivalent to reax/in.reax.rdx
units real
newton on
atom_style charge
read_data data.rdx
orthogonal box = (35 35 35) to (48 48 48)
2 by 2 by 2 MPI processor grid
reading atoms …
21 atoms
pair_style reax/c control.reax_c.rdx
pair_coeff * * ffield.reax C H O N
compute reax all pair reax/c
variable eb equal c_reax[1]
variable ea equal c_reax[2]
variable elp equal c_reax[3]
variable emol equal c_reax[4]
variable ev equal c_reax[5]
variable epen equal c_reax[6]
variable ecoa equal c_reax[7]
variable ehb equal c_reax[8]
variable et equal c_reax[9]
variable eco equal c_reax[10]
variable ew equal c_reax[11]
variable ep equal c_reax[12]
variable efi equal c_reax[13]
variable eqeq equal c_reax[14]
neighbor 2.5 bin
neigh_modify every 10 delay 0 check no
fix 1 all nve
fix 2 all qeq/reax 1 0.0 10.0 1.0e-6 reax/c
thermo 10
thermo_style custom step temp epair etotal press v_eb v_ea v_elp v_emol v_ev v_epen v_ecoa v_ehb v_et v_eco v_ew v_ep v_efi v_eqeq
timestep 1.0
#dump 1 all atom 10 dump.reaxc.rdx
#dump 2 all image 25 image.*.jpg type type # axes yes 0.8 0.02 view 60 -30
#dump_modify 2 pad 3
#dump 3 all movie 25 movie.mpg type type # axes yes 0.8 0.02 view 60 -30
#dump_modify 3 pad 3
run 100
Neighbor list info …
2 neighbor list requests
update every 10 steps, delay 0 steps, check no
master list distance cutoff = 12.5
Memory usage per processor = 11.3935 Mbytes
Step Temp E_pair TotEng Press eb ea elp emol ev epen ecoa ehb et eco ew ep efi eqeq
0 0 -1884.3081 -1884.3081 27186.178 -2958.4712 79.527715 0.31082031 0 98.589783 25.846176 -0.18034154 0 16.709078 -9.1620736 938.43732 -244.79981 0 168.88445
10 1288.6115 -1989.6644 -1912.8422 -19456.352 -2734.6769 -15.607219 0.20177961 0 54.629557 3.125229 -77.7067 0 14.933901 -5.8108542 843.92074 -180.43321 0 107.75934
20 538.95844 -1942.7037 -1910.5731 -10725.661 -2803.7395 7.9078326 0.077926683 0 81.610046 0.22951932 -57.557102 0 30.331203 -10.178049 878.99015 -159.69247 0 89.316704
30 463.09527 -1933.5765 -1905.9685 -33255.508 -2749.8591 -8.015461 0.027628739 0 81.627406 0.11972398 -50.26228 0 20.82032 -9.632703 851.8872 -149.49539 0 79.206121
40 885.49546 -1958.9125 -1906.1227 -4814.6602 -2795.644 9.1506113 0.13747487 0 70.948056 0.24360554 -57.862694 0 19.076515 -11.141211 873.73892 -159.99391 0 92.434067
50 861.1612 -1954.4601 -1903.121 -1896.6972 -2784.8449 3.8269556 0.15793303 0 79.851646 3.3492094 -78.06613 0 32.628941 -7.9565312 872.81848 -190.9857 0 114.75999
60 1167.7836 -1971.8435 -1902.2246 -3482.8401 -2705.864 -17.121532 0.22749081 0 44.507713 7.8560062 -74.789009 0 16.25651 -4.6046704 835.83079 -188.3369 0 114.19414
70 1439.9913 -1989.3025 -1903.4556 23845.778 -2890.7894 31.958717 0.26671721 0 85.758358 3.1804063 -71.002948 0 24.357193 -10.311288 905.86811 -175.38499 0 106.79672
80 502.39872 -1930.755 -1900.8039 -20356.345 -2703.8112 -18.662647 0.11286147 0 99.803603 2.0329517 -76.171319 0 19.236871 -6.2786547 826.47441 -166.03145 0 92.539593
90 749.08377 -1946.984 -1902.3264 17798.642 -2863.7584 42.068701 0.24338049 0 96.181649 0.96183585 -69.955518 0 24.615447 -11.582751 903.68869 -190.13824 0 120.69123
100 1109.6942 -1968.5879 -1902.4322 -4490.3571 -2755.8987 -7.1225982 0.21757676 0 61.805995 7.0826206 -75.645463 0 20.115343 -6.2372537 863.56466 -198.5695 0 122.09938
Loop time of 0.52758 on 8 procs for 100 steps with 21 atoms
91.3% CPU use with 8 MPI tasks x 1 OpenMP threads
Performance: 16.377 ns/day 1.465 hours/ns 189.545 timesteps/s
MPI task timings breakdown:
Section | min time | avg time | max time |%varavg| %total