GPU Lammps : libcuda.so.1: cannot open shared object file

_Vaidyanathan_M.S · November 15, 2017, 6:32pm

Hi LAMMPS Users

I am trying to install GPU version of LAMMPS. I was successful in compiling LAMMPS with GPU version and it did generate the executable lmp_mesabi. Details regarding arch/version of computer/LAMMPS are at the end of the email.

However when I try to run, it throws the exception

error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

From reading the previous threads, I realized it is a problem with the dynamic linking.

When I execute

vsethura@…7239… [~/mylammps/src] % ldd lmp_mesabi

linux-vdso.so.1 => (0x00007ffdf1bfa000)
/lib64/snoopy.so (0x00007f2941afd000)
libmpi.so.12 => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib/libmpi.so.12 (0x00007f294132e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2941111000)
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007f2940ec1000)
libcudart.so.8.0 => /panfs/roc/msisoft/cuda/8.0/lib64/libcudart.so.8.0 (0x00007f2940c5b000)
libcuda.so.1 => not found
libdl.so.2 => /lib64/libdl.so.2 (0x00007f2940a57000)
libstdc++.so.6 => /panfs/roc/msisoft/gcc/4.9.2_2/lib64/libstdc++.so.6 (0x00007f2940745000)
libmkl_intel_lp64.so => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f293fc35000)
libmkl_core.so => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/compilers_and_librares_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so (0x00007f293e224000)
libmkl_sequential.so => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/compilers_and_libraries_2016.3.210/linux/mkl/lib/intel64/libmkl_sequential.so (0x00007f293d54b000)
libm.so.6 => /lib64/libm.so.6 (0x00007f293d2c7000)
libmpifort.so.12 => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007f293cf29000)
librt.so.1 => /lib64/librt.so.1 (0x00007f293cd21000)
libgcc_s.so.1 => /panfs/roc/msisoft/gcc/4.9.2_2/lib64/libgcc_s.so.1 (0x00007f293cb0b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f293c777000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2941cfe000)

I realize that the libcuda.so.1 is not found.

But I did give a correct softlink to the main file. For instance, when I execute,

vsethura@…7239… [~/mylammps/lib/gpu] % ls -l libcuda.so.1

lrwxrwxrwx 1 vsethura dorfmank 50 Nov 15 11:30 libcuda.so.1 -> /panfs/roc/msisoft/cuda/8.0/lib64/stubs/libcuda.so

which I would expect to mean that it is correctly linked.

Further, as per Axel’s suggestion in one of the previous posts, I added the -Wl,-rpath links to the Makefile.lammps too. The following is my lammps Makefile from GPU folder

gpu_SYSINC =
gpu_SYSLIB = -lcudart -lcuda
gpu_SYSPATH = -L/panfs/roc/msisoft/cuda/8.0/lib64/stubs -Wl,-rpath,/path/panfs/roc/msisoft/cuda/8.0/lib64/stubs

Also, I added the path to LD_LIBRARY_PATH

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/panfs/roc/msisoft/cuda/8.0/lib64/stubs

It would be great if anyone could point out what I am doing wrong or missing something obvious.

More details

ARCHITECTURE: MESABI SUPERCOMPUTER - KEPLER K40 nodes sm_35

LAMMPS Version: LAMMPS (22 Sep 2017)

Thanks in advance

Vaidyanathan M S

Postdoctoral Research Assistant

University of Minnesota, Twin Cities

Stefan_Paquay · November 15, 2017, 6:43pm

I don’t know for sure but my guess is at runtime, the local directory is not checked for libraries. Why is there no libcuda.so in a sensible location? Mine is in /usr/lib/. You probably don’t want your executable to actually use the stubs either.

akohlmey · November 15, 2017, 6:43pm

Hi LAMMPS Users

I am trying to install GPU version of LAMMPS. I was successful in
compiling LAMMPS with GPU version and it did generate the executable
lmp_mesabi. Details regarding arch/version of computer/LAMMPS are at the
end of the email.

However when I try to run, it throws the exception

error while loading shared libraries: libcuda.so.1: cannot open shared
object file: No such file or directory

From reading the previous threads, I realized it is a problem with the
dynamic linking.

When I execute

[email protected]... [~/mylammps/src] % ldd lmp_mesabi

linux-vdso.so.1 => (0x00007ffdf1bfa000)
/lib64/snoopy.so (0x00007f2941afd000)
libmpi.so.12 => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/
compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib/libmpi.so.12
(0x00007f294132e000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2941111000)
libjpeg.so.62 => /usr/lib64/libjpeg.so.62 (0x00007f2940ec1000)
libcudart.so.8.0 => /panfs/roc/msisoft/cuda/8.0/lib64/libcudart.so.8.0
(0x00007f2940c5b000)
libcuda.so.1 => not found
libdl.so.2 => /lib64/libdl.so.2 (0x00007f2940a57000)
libstdc++.so.6 => /panfs/roc/msisoft/gcc/4.9.2_2/lib64/libstdc++.so.6
(0x00007f2940745000)
libmkl_intel_lp64.so => /panfs/roc/intel/x86_64/2016/p
arallel_studio_xe_msi/compilers_and_libraries_2016.3.210/
linux/mkl/lib/intel64/libmkl_intel_lp64.so (0x00007f293fc35000)
libmkl_core.so => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/
compilers_and_librares_2016.3.210/linux/mkl/lib/intel64/libmkl_core.so
(0x00007f293e224000)
libmkl_sequential.so => /panfs/roc/intel/x86_64/2016/p
arallel_studio_xe_msi/compilers_and_libraries_2016.3.210/
linux/mkl/lib/intel64/libmkl_sequential.so (0x00007f293d54b000)
libm.so.6 => /lib64/libm.so.6 (0x00007f293d2c7000)
libmpifort.so.12 => /panfs/roc/intel/x86_64/2016/parallel_studio_xe_msi/
compilers_and_libraries_2016.3.210/linux/mpi/intel64/lib/libmpifort.so.12
(0x00007f293cf29000)
librt.so.1 => /lib64/librt.so.1 (0x00007f293cd21000)
libgcc_s.so.1 => /panfs/roc/msisoft/gcc/4.9.2_2/lib64/libgcc_s.so.1
(0x00007f293cb0b000)
libc.so.6 => /lib64/libc.so.6 (0x00007f293c777000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2941cfe000)

I realize that the libcuda.so.1 is not found.

But I did give a correct softlink to the main file. For instance, when I
execute,

[email protected]... [~/mylammps/lib/gpu] % ls -l libcuda.so.1

lrwxrwxrwx 1 vsethura dorfmank 50 Nov 15 11:30 libcuda.so.1 ->
/panfs/roc/msisoft/cuda/8.0/lib64/stubs/libcuda.so

which I would expect to mean that it is correctly linked.

Further, as per Axel's suggestion in one of the previous posts, I added
the -Wl,-rpath links to the Makefile.lammps too. The following is my lammps
Makefile from GPU folder

gpu_SYSINC =
gpu_SYSLIB = -lcudart -lcuda
gpu_SYSPATH = -L/panfs/roc/msisoft/cuda/8.0/lib64/stubs
-Wl,-rpath,/path/panfs/roc/msisoft/cuda/8.0/lib64/stubs

this is *incorrect*! you *must* not point to the "stubs" folder. this is
sufficient to compile and link your executable, but not to run it.
libcuda.so.1 is provided by the CUDA *driver* which is part of the GPU
driver and which is usually only installed on compute nodes with GPUs.

this is all neither a LAMMPS, nor a GPU issue, but a local setup issue.
please contact your local HPC support folks.

Also, I added the path to LD_LIBRARY_PATH

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/panfs/roc/msisoft/cuda/
8.0/lib64/stubs

It would be great if anyone could point out what I am doing wrong or
missing something obvious.

this is also wrong.

axel.

_Vaidyanathan_M.S · November 15, 2017, 7:55pm

I work on HPCs. So they are in the default location of HPC rather than in /usr/lib. But in one of the follow up emails, Axel asked to contact the local HPC support people. I shall do it then. I just wanted to make sure there is nothing which is wrong from my part in understanding LAMMPS link to GPU before contacting them.

Thanks for the help.

_Vaidyanathan_M.S · November 15, 2017, 8:37pm

It worked when I used compute nodes than the main nodes!! I thought loading the GPU modules in the front end would be enough.

Thanks Axel, Stefan