Cuda settings in Makefile

_Luis_Goncalves · August 15, 2011, 5:58pm

Hi everyone!

I could not find any information in the Makefiles about cuda_SYSINC,
cuda_SYSLIB and cuda_SYSPATH. On the web version of the manual it reads
that all settings are given in Makefile.g++ but nothing is there either.
Is this file not up-to-date?

Best,
Luis Goncalves

akohlmey · August 15, 2011, 6:13pm

Hi everyone!

I could not find any information in the Makefiles about cuda_SYSINC,
cuda_SYSLIB and cuda_SYSPATH. On the web version of the manual it reads
that all settings are given in Makefile.g++ but nothing is there either.
Is this file not up-to-date?

none of the makefiles have yet been updated for the user-cuda package.

axel.

_Christian_Muller · August 15, 2011, 7:17pm

Hi

no changes are necessary on the main Makefiles for the USER-CUDA package.
The cuda path has to be given in the lammps/lib/cuda/Makefile.common only.
It will be automatically included in the main compilation (through the lammps/src/Makefile.package which is included by lammps/src/MAKE/Makefile.* and includes the lammps/lib/cuda/Makefile.common for everyone interested in the details).

Cheers
Christian

-------- Original-Nachricht --------

sjplimp · August 15, 2011, 7:48pm

Christian is correct. That paragraph of the LAMMPS manual is an error.
There are no cuda_SYS variables in the lo-level Makefiles. Just
include the USER-CUDA package and you should be ready to
make LAMMPS with it.

Steve

_Luis_Goncalves · August 15, 2011, 8:05pm

Hi Christian,

I checked those makefiles and they seem ok. However, I cannot run
properly the example in.phosphate.cuda. The log says:

PPPM initialization ...
  G vector = 0.210111
  grid = 108 108 108
  stencil order = 5
  RMS precision = 8.76251e-06
  using double precision FFTs
  brick FFT buffer size/proc = 1520875 1259712 158700
# CUDA: VerletCuda::setup: Allocate memory on device for maximum of 295650
atoms...
# CUDA: Using precision: Global: 4 X: 4 V: 4 F: 4 PPPM: 4
Setting up run ...
# CUDA: VerletCuda::setup: Upload data...
# CUDA: Total Device Memory useage post setup: 124.820312 MB
Memory usage per processor = 1177.66 Mbytes
Step Temp E_pair E_mol TotEng Press Volume
       0 400.30257 -2381941.4 0 -2366643.6 -450.02396
  4242016.4
WARNING: # CUDA: You asked for a Verlet integration using Cuda, but
selected a pair force which has not yet been por
ted to Cuda
WARNING: # CUDA: You asked for a Verlet integration using Cuda, but
selected a kspace force which has not yet been p
orted to Cuda
WARNING: # CUDA: You asked for a Verlet integration using Cuda, but
several fixes have not yet been ported to Cuda.
This can cause a severe speed penalty due to frequent data synchronization
between host and GPU.

This job is taking minutes to complete 100 steps...
What could be the problem?

Best,
Luis

_Christian_Muller · August 15, 2011, 9:25pm

You did not actually choose to use cuda styles (the warnings are telling you that the CPU styles are used). In this case you need to put in "pppm/cuda" instead of "pppm" in the script and provide "-sf cuda" as command line argument.

Cheers
Christian

-------- Original-Nachricht --------

_Luis_Goncalves · August 16, 2011, 1:51pm

Good! I seem to make progress... Now I faced another problem when I ran a
buck/coul/long pair force with cuda using pppm/cuda

LAMMPS (14 Aug 2011)
# Using LAMMPS_CUDA
USER-CUDA mode is enabled
Lattice spacing in x,y,z = 5.9712 11.9424 5.9712
Created orthogonal box = (0 0 0) to (35.8272 35.8272 35.8272)
  1 by 1 by 1 processor grid
Created 3888 atoms
864 atoms in group Li
Setting atom values ...
  864 settings made for charge
864 atoms in group Si
Setting atom values ...
  864 settings made for charge
2160 atoms in group O
Setting atom values ...
  2160 settings made for charge
# CUDA: Activate GPU
PPPMCuda initialization ...
  G vector = 0.307625
  grid = 24 24 24
  stencil order = 7
  RMS precision = 3.42634e-05
  brick FFT buffer size/proc = 35937 13824 16335
WARNING: # CUDA: You asked for the useage of Coulomb Tables. This is not
supported in CUDA Pair forces. Setting is i
gnored.

# CUDA: VerletCuda::setup: Allocate memory on device for maximum of 10000
atoms...
# CUDA: Using precision: Global: 4 X: 4 V: 4 F: 4 PPPM: 4
Setting up run ...
# CUDA: VerletCuda::setup: Upload data...
Test TpA
Test BpA

# CUDA: Timing of parallelisation layout with 10 loops:
# CUDA: BpA TpA
0.047665 0.051288
# CUDA: Total Device Memory useage post setup: 96.445312 MB
Memory usage per processor = 8.61047 Mbytes
Step Temp E_pair E_mol TotEng Press
0 3600.0001 -54140.566 0 -52331.803 266209.51

  If I use cpu pppm, this script runs fine.
  Any ideas?
  Thank you for your help!

Luis

_Christian_Muller · August 16, 2011, 1:55pm

Uhm what exactly is your problem?

From your output everything seems to launch just fine.

Christian

-------- Original-Nachricht --------