Error in running LAMMPS on a GPU cluster

I am simulating a bead-spring model of a polymer in LAMMPS with a FENE interaction between bonded atoms and Lennard-Jones interaction between non-bonded atoms. I am running this on a GPU cluster. Here is my input script.

############################################################
LAMMPS POLYMER MOLECULAR DYNAMICS
#Bead spring model of polymer with FENE and LJ interaction
############################################################
package gpu 1

Box and units (use LJ units and periodic boundaries)

units lj # lennard-jones units
atom_style bond # atoms with bonds
boundary p p p # all boundaries are periodic

read_data initial_configuration.txt

FENE potential and LJ interaction

Between bonded atoms

bond_style fene
special_bonds fene # Prevents LJ from being counted twice
bond_coeff 1 10.0 1.5 0.305 1.0

Between non-bonded atoms

pair_style lj/cut/gpu 2.5
pair_modify shift yes
pair_coeff 1 1 1.0 1.0 1.122461
pair_coeff 1 2 1.0 1.0 1.122461
pair_coeff 2 2 2.0 1.0 2.5

Set up fixes

variable seed equal 54654651

fix 1 all nve # NVE integrator
fix 2 all langevin 1.0 1.0 1.0 ${seed} # langevin thermostat

Output thermodynamic info (temperature, energy, pressure, etc.)##########################

thermo 1000
thermo_style custom step temp etotal pe ke epair emol press vol

set timestep of integrator

timestep 0.005

run

dump 1 all atom 10000 dump.lammpstrj
dump 2 all custom 10000 dump.position.lammpsbin id type x y z vx vy vz

run 1000000
########## Input script end here #########

There are two types of beads in the polymer. Number of beads is 512, Number of bonds is 511.

My simulation works without using GPU, but when I attempt to simulate it on the GPU, it runs initially for 12000 steps only and then stops with the following error.

Cuda driver error 700 in call at file ‘/lfs/sware/lammps/lammps-2Aug2023/lib/gpu/geryon/nvd_timer.h’ in line 76.
Abort(-1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1

I tried changing the time-step also but it stops exactly after 12000 steps.
I am running this using 4cpu and 1gpu node.

Please try to use some of the benchmark or example inputs bundled with LAMMPS.

Please note, that your quoted input is difficult to read since you are not quoting it properly. See the forum guidelines.

The error you quote comes from a low level mismatch in the GPU library and thus is difficult to debug. To give meaningful advice exact details are needed describing how LAMMPS was configured and compiled, which versions of tools and libraries you use and what hardware you have.

Could also try the KOKKOS package, there was some recent work done to speed up bead-spring.