Missing libcudart.so.4 library?

Hi,

I am trying to compile/execute lammps with gpu support (and after that works, cuda). I installed the cuda libraries from nvidia. I compiled lammps with make no-gpu AND tested - runs fine. Next I compiled the lib/gpu with Make.gfortran and got libgpu.a and Makefile.lammps files. I then recompiled lammps with make yes-gpu and that compiles fine and produces an exe file lmp_openmpi.

Testing the new exe file I get the following:
schall2@…2898…:~$ lmp_openmpi -h
lmp_openmpi: error while loading shared libraries: libcudart.so.4: cannot open shared object file: No such file or directory

A quick check of the makefile in lib/gpu shows that:
gpu_SYSINC =
gpu_SYSLIB = -lcudart -lcuda
gpu_SYSPATH = -L/usr/local/cuda/lib64

And a look in /usr/local/cuda/lib64 turns up libcudart.so.4 linked to libcudart.so.4.0.17. So the file appears to exist despite arguments to the contrary.

Also nvc_get_devices throws the same error:
schall2@…2898…:/usr/local/lib/lammps/lib/gpu$ ./nvc_get_devices
./nvc_get_devices: error while loading shared libraries: libcudart.so.4: cannot open shared object file: No such file or directory

Any ideas?

Thanks.

Hi,

I am trying to compile/execute lammps with gpu support (and after that
works, cuda). I installed the cuda libraries from nvidia. I compiled lammps
with make no-gpu AND tested - runs fine. Next I compiled the lib/gpu with
Make.gfortran and got libgpu.a and Makefile.lammps files. I then recompiled
lammps with make yes-gpu and that compiles fine and produces an exe file
lmp_openmpi.

Testing the new exe file I get the following:
[email protected]...:~$ lmp_openmpi -h
lmp_openmpi: error while loading shared libraries: libcudart.so.4: cannot
open shared object file: No such file or directory

A quick check of the makefile in lib/gpu shows that:
gpu_SYSINC =
gpu_SYSLIB = -lcudart -lcuda
gpu_SYSPATH = -L/usr/local/cuda/lib64

And a look in /usr/local/cuda/lib64 turns up libcudart.so.4 linked to
libcudart.so.4.0.17. So the file appears to exist despite arguments to the
contrary.

those are two different issues. -L tells the "static" linker
(ld) where to find libraries that an executable can be linked to.
for shared libraries on the the symbolic name of the library
is encoded into the binary, _not_ its location (as it can change)

the dynamic linker (e.g. ld-linux.so.2) now needs to find
the libcudart.so.4 file at runtime and that linker looks through
directories specified in /etc/ld.so.conf, or in files contained
in /etc/ld.so.conf.d/ and also the contents of $LD_LIBRARY_PATH
the latter has precedence.

it is possible to encode a default for $LD_LIBRARY_PATH
into binaries with special (linux) linker tricks.

in your case, you would add to gpu_SYSPATH:
-Wl,-rpath,/usr/local/cuda/lib64

this is an option that i personally prefer, since this
makes executables work more easily in standalone
mode, and one can encode several common locations
into the binary.

HTH,
     axel.

Thanks Axel,

Yes, it does help, “gpu_SYSPATH = -L/usr/local/cuda/lib64 -Wl,-rpath,/usr/local/cuda/lib64” fixes the lmp executable. Any ideas on the nvc_get_devices error? Or does it even matter as the gpu examples seem to test out fine.

Dave

Thanks Axel,

Yes, it does help, "gpu_SYSPATH = -L/usr/local/cuda/lib64
-Wl,-rpath,/usr/local/cuda/lib64" fixes the lmp executable. Any ideas on the
nvc_get_devices error?

it is the same problem and you can use the same trick.

in general, when you follow the instructions for installing
cuda, it should have recommended setting LD_LIBRARY_PATH
properly somewhere in any case, and then you would not
have seen this message. the same thing happens for _any_
shared library that you link to and that is not placed in a
standard location or in a directory listed in LD_LIBRARY_PATH.

Or does it even matter as the gpu examples seem to
test out fine.

nvc_get_devices is a test program that can be used
to identify your GPUs and to debug potential issues.

cheers,
    axel.