[lammps-users] linking errors when building gpu lammps

Dear LAMMPS GPU developers and users,

I have a linking problem with building the GPU LAMMPS project:

[[email protected]... src]$ make linux
make[1]: Entering directory `/home/ndtrung/codes/lammps-gpu/src/Obj_linux'
mpic++ -O -L../../lib/gpu -L/home/ndtrung/codes/fftw-2.1.5/build/lib -L/opt/cuda3.0/lib64 angle_charmm.o angle_cosine.o ...

../../lib/gpu/libgpu.a(pair_gpu_device.o): In function `lmp_gpu_forces(double**, double**, double*, double**, double*, double&)':
pair_gpu_device.cpp:(.text+0xa4): undefined reference to `cuStreamSynchronize'
../../lib/gpu/libgpu.a(pair_gpu_device.o): In function `ucl_cudadr::UCL_Timer::clear()':
pair_gpu_device.cpp:(.text._ZN10ucl_cudadr9UCL_Timer5clearEv[ucl_cudadr::UCL_Timer::clear()]+0x14): undefined reference to `cuEventDestroy'
pair_gpu_device.cpp:(.text._ZN10ucl_cudadr9UCL_Timer5clearEv[ucl_cudadr::UCL_Timer::clear()]+0x21): undefined reference to `cuEventDestroy'
pair_gpu_device.cpp:(.text._ZN10ucl_cudadr9UCL_Timer5clearEv[ucl_cudadr::UCL_Timer::clear()]+0x6b): undefined reference to `cuEventDestroy'
../../lib/gpu/libgpu.a(pair_gpu_device.o): In function `ucl_cudadr::UCL_H_Vec<float>::~UCL_H_Vec()':
pair_gpu_device.cpp:(.text._ZN10ucl_cudadr9UCL_H_VecIfED1Ev[ucl_cudadr::UCL_H_Vec<float>::~UCL_H_Vec()]+0x1d): undefined reference to `cuMemFreeHost'

...

I should mention that before building LAMMPS, I already built lib/gpu/libgpu.a by checking out the source (revision 577) from:

svn checkout http://gpulammps.googlecode.com/svn/trunk/ lammps-gpu

and modifying the Makefile.linux file to point to the CUDA toolkit. And running nvc_get_devices showed:

Using platform: NVIDIA Corporation NVIDIA CUDA
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.0

Device 0: "GeForce GTX 480"
   Type of device: GPU
   Compute capability: 2
   Double precision support: Yes

...

Device 1: "nForce 980a/780a SLI"
   Type of device: GPU
   Compute capability: 1.1
   Double precision support: No

(One note is that in the geryon/nvc_device.h, line 293, CUDART_VERSION should be > 3000 because the ECCEnabled field of cudaDeviceProp is not available for CUDA 3.0 and older.)

Next, I modified the MAKE/Makefile.linux in the LAMMPS src to point to the CUDA lib path on my machine (the same as that I used for building libgpu)

gpu_SYSLIB = -lcudart
...
gpu_SYSPATH = -L/opt/cuda3.0/lib64

and run

make yes-gpu
make linux

Everything was smooth until the linking part gave the errors as shown above. The errors seem to indicate that the compiler could not find the CUDA library (libcudart), which I don't know why.

Did I miss something important and is there further information I should provide?

Please comment and help. Thanks,

-Trung

trung,

first recommendation. please upgrade to cuda-3.2
(just released). i am currently having a hell of a time
trying to make something work on NCSA's lincoln with 3.0.
3.1 on longhorn and 3.2rc2 on multiple local machines
have been working very well.

second hint, the GPU module not only needs to be
linked to libcudart.so, but also the matching libcuda.so
this one is often no installed on cluster frontends or
in an incompatible version. my recommendation is to
copy it from a compute node to you lib/gpu directory.

third suggestion. the gpulammps repository has some
very experimental code. i have cut out those parts and
updated the rest to be compatible with the current mainline
lammps code and added it to my lammps-icms branch.
so i would recommend to use that branch instead, if you
only intend to use the GPU module. this way, you should
be fully up-to-date with all the features and bugfixes from
the mainline code. http://goo.gl/oKYI

of course, you are free to use the development code...

cheers,
    axel.

Thanks a lot, Axel.

Your second hint solves the linking problem: I added -lcuda to gpu_SYSLIB and the errors went away.

I tried cuda-3.2.12 on the machine and got similar linking errors (now I know why :slight_smile: )

Now I run an example (in.lj) and got the following errors:

Setting up run ...
lmp_linux: pair_gpu_cell.cu:486: void build_cell_list(double*, int*, cell_list&, int, int, int, int, int, int, int, int): Assertion `err == cudaSuccess' failed.
[glotzgpu2:09443] *** Process received signal ***
[glotzgpu2:09443] Signal: Aborted (6)
[glotzgpu2:09443] Signal code: (-6)
[glotzgpu2:09443] [ 0] /lib64/libpthread.so.0 [0x3644e0eb10]
[glotzgpu2:09443] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3644230265]
[glotzgpu2:09443] [ 2] /lib64/libc.so.6(abort+0x110) [0x3644231d10]
[glotzgpu2:09443] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x36442296e6]
[glotzgpu2:09443] [ 4] /home/ndtrung/codes/lammps/src/lmp_linux [0x67056c]
[glotzgpu2:09443] [ 5] /home/ndtrung/codes/lammps/src/lmp_linux(_Z12_lj_gpu_cellIffEdR13LJ_GPU_MemoryIT_T0_EPPdS5_S6_PiiiibbPKdS9_+0x125) [0x669805]
...

As I recall, these errors are similar to those reported a while ago with the mainline lammps gpu package. I think I gonna try your lammps-icms branch..

-Trung

Quoting Axel Kohlmeyer <[email protected]>:

trung,

please try the input examples from this directory first:

http://code.google.com/p/gpulammps/source/browse/#svn/trunk/bench/single_gpu

i have validated and updated them with the current gpulammps code last weekend.
i will make a full set of reference runs - done on TACC's longhorn yesterday -
available later today, too.

thanks,
    axel.

Great! I tried lj-gpu.in in bench/single-gpu and it seems to work well:

[[email protected]... single_gpu]$ lmp_linux -in lj-gpu.in -var len 16
LAMMPS (5 Jun 2010)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (26.8735 26.8735 26.8735)
   1 by 1 by 1 processor grid
Created 16384 atoms

trung,

[...]

FYI, I built libgpu and lammps with cuda 3.2 successfully but got a runtime
error with geryon/nvd_kernel.h at the function call cuFuncSetBlockShape()
(line 206). I then realized that the CUDA runtime version on the machine is
3.0 (via nvc_get_devices) which may cause runtime errors. So I reverted to
cuda-3.0, and the binary works fine.

why didn't you just copy the runtime library and set LD_LIBRARY_PATH?
you did seem to have a suitable driver...

In summary, I think there are two issues to be addressed in the GPU LAMMPS
building instructions (correct me if they were already there but I missed
them):

1- the cuda library to build gpu and lammps against should match the actual
runtime version on the machine (shown by nvc_get_devices), and

actually, that dependency only pertains to the code imported from the
cuda performance primitives package. you'll see that those are the only
.cu_o files in obj. that will go away in a future version and then only
llibcuda will be needed. your situation is very unusual. if you compile
with nvcc from cuda 3.2 then you should also use the runtime from
cuda 3.2. the dynamical linker cannot tell them apart, since the "soname"
is libcudart.so.3 in both cases. this is quit normal, since only backward
but not forward compatibility is provided. if you would compile and link
a binary on a new linux distribution with a new glibc, you would also not
be able to run that binary on an older distribution with an older version
of glibc. in my opinion, that is more of a PEPCAC situation, than a
problem of the code or the documentation.

nevertheless, i'll have a look and see if there was an easy way
to test for this condition and make lammps terminate with a more
meaningful error message.

2- add -lcuda to the gpu_SYSLIB in the src/MAKE/Makefile.*, also make sure
that libcuda.so is in the ${CUDA_HOME}/lib64 (or where libcudart.so is also
located.)

actually, the canonical location for libcuda.so is in /usr/lib64/
(or /usr/lib64/nvidia if you use rpms from rpmfusion)
it comes with the nvidia driver and has to match the kernel
module. this is different from the cuda runtime library, that
ships with the cuda toolkit and will dlopen() libcuda.so as needed.
of course, it would be very convenient to have a suitable
symbolic link in your installation (as to older libcudart.so.* files).
this is something that is best communicated to and handled
by your local system administrator.

cheers,
    axel.

Oops, my bad! I mistook the location of libcuda.so, as you pointed out, which is /usr/lib64 on the machine I'm working on.

You're right. I just need to add to LD_LIBRARY_PATH the path to cuda-3.2 to make the binary work well. Do you mean this is a PEBKAC situation? If yes, I agree.

Cheers,
-Trung

Oops, my bad! I mistook the location of libcuda.so, as you pointed out,
which is /usr/lib64 on the machine I'm working on.

no big deal. as a matter of convenience, it is actually a
good idea to have a symlink for it in the cuda toolkit
library directories.

You're right. I just need to add to LD_LIBRARY_PATH the path to cuda-3.2 to
make the binary work well. Do you mean this is a PEBKAC situation? If yes, I
agree.

yes, that is what i meant.

exactly. in fact, having this happen to you and having talked about
it "in the open" will hopefully help others to avoid these pitfalls.

for people with a little bit more experience with how the library
resolution process works on modern operating systems, it might
be worth considering the following alternate solution: you can
also "encode" preferred LD_LIBRARY_PATH entries into a binary.
this is done with, e.g. -Wl,-rpath,/usr/local/cuda-3.2/lib64

this way you avoid confusion in case you have multiple versions
of the cuda toolkit installed and want to test them side by side.

cheers,
     axel.

The experimental code in the repository should have no effect on users
unless they attempt to build with OpenMP flags (that are nowhere in the
repository makefiles). The error you report is from a file that should no
longer exist in the repository (google, not main lammps).

Where did you get this code?

I think the main advantage of the ICMS distribution (for the GPU library)
is that the rest of the code is up to date with LAMMPS.

Regarding the runtime/driver issues; I also find this very annoying. A
couple of routines need to be changed and then the user should no longer
have to use the cuda compiler at all to build lammps with gpu support.
This should (hopefully) make the build process much, much easier and will
also make the OpenCL build fully functional. This change is
straightforward, but part of a long list.

We have a paper describing the GPU library that has been accepted pending
minor revision. I can send this to anyone interested in trying out the
library who would like to know how it is actually working. Will probably
add CHARMM/Gromacs very soon...

- Mike

Hi Mike,

I was messed up with the binary from the main lammps and the one from the google repository. The error was actually from the main lammps, not from the experimental code.

I'd appreciate if you can send me a copy of that paper. Thanks,

-Trung

Quoting "Brown, W. Michael" <[email protected]...>: