The problem of gpu lib building

_zsjanyy · November 13, 2012, 11:43am

Recently I tried to build gpu version of lammps. Following the
instructions of the Manual, I have built the libgpu.a and
nvc_get_devices successfully with openmpi. However, when I execute the
nvc_get_devices, the following error message appears:

./nvc_get_devices
Cuda driver error 100 in call at file './geryon/nvd_device.h' in line 232.
application called MPI_Abort(comm=0x44000000, -1) - process 0
In: PMI_Abort(-1, application called MPI_Abort(comm=0x44000000, -1) - process 0)

I changed to mpich and the following error appears:
./nvc_get_devices
Cuda driver error 100 in call at file './geryon/nvd_device.h' in line 232.
Attempting to use an MPI routine before initializing MPICH

Since there is libgpu.so have generated, I compiled the lammps and got
the final executable lammps version. However, when I execute it, the
same error message shown above is observed. When I looked into the 232
line of './geryon/nvd_device.h', I just could not figure out what is
wrong. Could anyone give me some advices? Thank you very much.

sjplimp · November 13, 2012, 3:07pm

Mike or Christian may have a suggestion. My guess
is some incompatibility with your NVIDIA software.

Steve

akohlmey · November 13, 2012, 3:07pm

Recently I tried to build gpu version of lammps. Following the
instructions of the Manual, I have built the libgpu.a and
nvc_get_devices successfully with openmpi. However, when I execute the
nvc_get_devices, the following error message appears:

./nvc_get_devices
Cuda driver error 100 in call at file './geryon/nvd_device.h' in line 232.
application called MPI_Abort(comm=0x44000000, -1) - process 0
In: PMI_Abort(-1, application called MPI_Abort(comm=0x44000000, -1) -
process 0)

which tells you, that your GPU setup doesn't work,
or you compiled the GPU library for a GPU architecture
different from the one that you have.

I changed to mpich and the following error appears:
./nvc_get_devices
Cuda driver error 100 in call at file './geryon/nvd_device.h' in line 232.
Attempting to use an MPI routine before initializing MPICH

since the error message is GPU related, how should this be
corrected by changing the MPI library? that seems to be a
*very* strange way to solve a problem.

have you been able to run any other GPU software?

Since there is libgpu.so have generated, I compiled the lammps and got
the final executable lammps version. However, when I execute it, the

the compilation of a library says nothing about being able to
execute GPU code. if you cannot run nvc_get_device, then you
have to fix this first.

axel.

_Brown_W_Michael · November 13, 2012, 4:12pm

What axel said is correct. If nvc_get _devices doesn't work, this is typically a problem outside of lammps. This error means that no CUDA device can be found on your system. I would recommend starting with the CUDA sdk and nvidia forums to get this solved. Good luck. - Mike