Cuda driver error 500 in call at file 'geryon/nvd_texture.h' when running lammps

I am trying to set up a number of LAMMPS test cases for testing the performance on our GPU systems. I have built LAMMPS with GPU support and tried to run the bench_lj benchmark, which I downloaded some time ago from the Benchmarks website. I am running like this:
mpiexec -np 1 lmp_nas.v100 -sf gpu -pk gpu 1 -v x 4 -v y 4 -v z 8 -v t 1000
and get the error above with the following stacktrace:PT: #5 0x000000000084318d in LAMMPS_AL::Neighbor::init(LAMMPS_AL::NeighborShared*, int, int, int, int, ucl_cudadr::UCL_Device&, int, int, bool, int, int, int, int, int, bool, std::string const&, bool) ()
MPT: #6 0x000000000082ec7c in LAMMPS_AL::Device<float, double>::init_nbor(LAMMPS_AL::Neighbor*, int, int, int, int, int, int, double, bool, int, bool) ()
MPT: #7 0x0000000000860eb3 in LAMMPS_AL::BaseAtomic<float, double>::init_atomic(int, int, int, int, double, double, _IO_FILE*, void const*, char const*, int) ()
MPT: #8 0x000000000084cce9 in LAMMPS_AL::LJ<float, double>::init(int, double**, double**, double**, double**, double**, double**, double*, int, int, int, int, double, double, _IO_FILE*) ()
MPT: #9 0x00000000008399ea in ljl_gpu_init(int, double**, double**, double**, double**, double**, double**, double*, int, int, int, int, double, int&, _IO_FILE*)
MPT: ()
MPT: #10 0x0000000000760e32 in LAMMPS_NS::PairLJCutGPU::init_style (this=0x1e2888e0)

I googled around and found a similar problem reported, the advice was to consult with the LAMMPS developers.

  1. Am I running the test correctly? Maybe my flags are incorrect?
  2. Have you seen similar problems reported in the past? Maybe there is some sort of workaround?
  3. Could you advise on some basic sanity tests to check if my installation is ok?

Many thanks in advance for any guidance on this issue, Gabriele

Can you give more info: which GPU are you running on, how did you compile LAMMPS, what CUDA version, etc?

Thanks for getting back to me, Stan. I actually got it running by building with a more recent gcc. I ran the bench_lj test on gpu and cpu, they both completed.
mpiexec -np i {OP_SCOPE_EXE} -sf gpu -pk gpu i -log loggpu-lj-{i}rank.SingleNode -v x 4 -v y 4 -v z 8 -v t 1000 < in.lj > outgpu-lj-${i}rank.SingleNode
mpiexec -np i {OP_SCOPE_EXE} -sf cpu -log logcpu-lj-{i}rank.SingleNode -v x 4 -v y 4 -v z 8 -v t 1000 < in.lj > outcpu-lj-{i}rank.SingleNode
does this look correct? How can I check the output for correct execution? The performance looks like this:
outcpu-lj-1rank.SingleNode:Performance: 205.246 tau/day, 0.475 timesteps/s
outcpu-lj-2rank.SingleNode:Performance: 393.063 tau/day, 0.910 timesteps/s
outcpu-lj-4rank.SingleNode:Performance: 765.820 tau/day, 1.773 timesteps/s
outgpu-lj-1rank.SingleNode:Performance: 3422.075 tau/day, 7.921 timesteps/s
outgpu-lj-2rank.SingleNode:Performance: 6193.604 tau/day, 14.337 timesteps/s
outgpu-lj-4rank.SingleNode:Performance: 11196.732 tau/day, 25.918 timesteps/s

If you had pointers to some more interesting test cases, that would be great.
Thanks and greetings, Gabriele