Seg Fault when Compiling GPU Library

pbarc · September 17, 2019, 11:44pm

Hi Lammps Users,

I am trying to run LAMMPS (7 Aug 2019) with my GPU (NVIDIA Quadro P2000, 5GB), but am running into some issues compiling lib/gpu. I have nvcc version 10.1.243, but when I run nvidia-smi (see full output below) it says CUDA Version 10.2 (do you think this is a potential issue?).

In the lammps/src directory I try compiling the GPU library with
make lib-gpu args="-a sm_70 -b"

but get the following error.

mpicxx -DMPI_GERYON -DUCL_NO_EXIT -DMPICH_IGNORE_CXX_SEEK -DOMPI_SKIP_MPICXX=1 -fPIC -O2 -DLAMMPS_SMALLBIG -D_SINGLE_DOUBLE -I/usr/local/cuda/include -DUSE_CUDPP -Icudpp_mini -o lal_lj_sdk_long.o -c lal_lj_sdk_long.cpp -I./
Segmentation fault (core dumped)
Nvidia.makefile:166: recipe for target 'radixsort_app.cu_o' failed

I tried changing the architecture flag, but get similar errors.

I also tried making from the lammps/lib/gpu directory with
make -f Makefile.linux clean
make -f Makefile.linux

but I get a similar error

mpicxx -DMPI_GERYON -DUCL_NO_EXIT -DMPICH_IGNORE_CXX_SEEK -DOMPI_SKIP_MPICXX=1 -fPIC -O2 -DLAMMPS_SMALLBIG -D_SINGLE_DOUBLE -I/usr/local/cuda/include -DUSE_CUDPP -Icudpp_mini -o cudpp_plan_manager.o -c cudpp_mini/cudpp_plan_manager.cpp -Icudpp_mini
nvcc -I/usr/local/cuda/include -DUNIX -O3 --use_fast_math -DLAMMPS_SMALLBIG -Xcompiler -fPIC -Icudpp_mini -arch=sm_70 -D_SINGLE_DOUBLE -o radixsort_app.cu_o -c cudpp_mini/radixsort_app.cu
Segmentation fault (core dumped)
Nvidia.makefile:166: recipe for target 'radixsort_app.cu_o' failed
make: *** [radixsort_app.cu_o] Error 139

My CUDA_HOME=/usr/local/cuda. I have limited experience compiling with nvcc/GPU's, but I was able to successfully compile and run "Hello World!" examples with nvcc.

Any help would be greatly appreciated, as I am lost with how to fix this issue.

Thanks for your help, and please let me know if I should provide more info.

Best,
Paul Barclay

nvidia-smi

pbarc · September 17, 2019, 11:49pm

I forgot to include my OS info. It's Ubuntu 18.04.3

akohlmey · September 18, 2019, 12:35pm

Hi Lammps Users,

I am trying to run LAMMPS (7 Aug 2019) with my GPU (NVIDIA Quadro P2000,
5GB), but am running into some issues compiling lib/gpu. I have nvcc
version 10.1.243, but when I run nvidia-smi (see full output below) it
says CUDA Version 10.2 (do you think this is a potential issue?).

difficult to say from remote.

In the lammps/src directory I try compiling the GPU library with
make lib-gpu args="-a sm_70 -b"

but get the following error.

mpicxx -DMPI_GERYON -DUCL_NO_EXIT -DMPICH_IGNORE_CXX_SEEK
-DOMPI_SKIP_MPICXX=1 -fPIC -O2 -DLAMMPS_SMALLBIG -D_SINGLE_DOUBLE
-I/usr/local/cuda/include -DUSE_CUDPP -Icudpp_mini -o lal_lj_sdk_long.o
-c lal_lj_sdk_long.cpp -I./
Segmentation fault (core dumped)
Nvidia.makefile:166: recipe for target ‘radixsort_app.cu_o’ failed

I tried changing the architecture flag, but get similar errors.

a compiler should not terminate on a segmentation fault. this is a local problem and not a LAMMPS problem.
possible reasons for the segfault are:

incorrect/inconsistent software installation
hardware issues (bad RAM, bad cooling, bad CPU, corrupted disk)
local settings (e.g. too small stack size)
broken/incompatible software

I also tried making from the lammps/lib/gpu directory with
make -f Makefile.linux clean
make -f Makefile.linux

but I get a similar error

that is just calling the same commands. so no surprise you get the same error.

you should first try if you can compile/run some of the CUDA demo examples, e.g. the manybody demo. if this already fails, then you need to check your installation and or hardware.
similarly, you should try, if you can compile LAMMPS without GPU support (‘make serial’ with no packages installed should work on any Linux machine with a working GNU C++ compiler installation, similarly ‘make mpi’ should work with any properly done MPI library installation).

if everything works, you could try modifying lib/gpu/Makefile.linux by setting

CUDPP_OPT =

axel.

pbarc · September 18, 2019, 2:11pm

Thanks for your advise Axel.

I will try to compile some cuda examples that you mentioned, and see if I get a similar errors.

Best,
Paul

Bruce_Fan · September 18, 2019, 2:45pm

I would like to remind that (although this is not the reason for the compiling errors) with a Quadro P2000 GPU, which is of compute capability 6.1, the compiled code will not run if you compile with the -arch=sm_70 flag for nvcc. You can change it to sm_60 or sm_61.

Bruce