I decided to create two folders
1- lammps-11Aug17-m2000 where I set CUDA_ARCH=-arch=sm_50
2- lammps-11Aug-17-c2075 whre I set CUDA_ARCH=-arch=sm_21
Does that eliminate multiple GPU problem which I have to wait for patch?
As you can see below, for the first one, sm_50 symbols are there in lmp_mpi
and that device is 0, however, the lammps command fails
[email protected]...:~/lammps-11Aug17-m2000$ strings ~/lammps-11Aug17-m2000/src/lmp_mpi
> grep sm_50
.target sm_50
.target sm_50
[email protected]...:~/lammps-11Aug17-m2000$ cd ../eam/
[email protected]...:~/eam$ mpirun -np 4 ~/lammps-11Aug17-m2000/src/lmp_mpi -sf
gpu -pk gpu 0 -in in.eam
LAMMPS (11 Aug 2017)
ERROR: Illegal package gpu command (../fix_gpu.cpp:86)
Last command: package gpu 0
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status,
thus causing
the job to be terminated. The first process to do so was:
Process name: [[21154,1],1]
Exit code: 1
--------------------------------------------------------------------------
[email protected]...:~/eam$ ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery/deviceQuery
> grep M2000
Device 0: "Quadro M2000"
> Peer access from Quadro M2000 (GPU0) -> Tesla C2075 (GPU1) : No
> Peer access from Tesla C2075 (GPU1) -> Quadro M2000 (GPU0) : No
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.0, CUDA Runtime
Version = 8.0, NumDevs = 2, Device0 = Quadro M2000, Device1 = Tesla C2075
How that can be explained?
simple. you didn't pay sufficient attention to the documentation.
specifically for using an older version of LAMMPS you *must* look at the
documentation matching your specific version, as the syntax or semantics of
commands may be different from the current version. the LAMMPS homepage
always shows the documentation for the latest patch/development version
(not stable).
for the current version of LAMMPS "-pk gpu 0" means use *all* available
GPUs. for your version, this "wildcard" does not exist, so you have to use
a number > 0.
to select individual GPUs, you cannot use the LAMMPS command line or script
commands, but have to set the CUDA_VISIBLE_DEVICES environment variable to
instruct the nvidia driver which GPUs are visible. the fact, that the CUDA
utility reports the GPUs as #0 and #1 is irrelevant for LAMMPS itself. it
*does* matter for the CUDA_VISIBLE_DEVICES environment variable, though.
Something is going crazy here...
PEBCAC!
axel.