Error when run large system with GPU

Dear developers:
This days i came across a interesting problem. When i simulate a large system which has 1000000 atoms, it get a error says “Cuda driver error 1 in call at file ‘/root/Public/lammps/lib/gpu/geryon/nvd_kernel.h’ in line 364.”. Then i test other situations:
A. run same system via cpu : mpirun -np 16 lmp_mpi -in system.in. IT’s OK.
B. run same system via cpu & gpu: mpirun -np 16 lmp_gpu -sf gpu -pkg gpu 1 -in system.in . Error.
C. run small system with 800000 atoms via cpu and via cpu & gpu. OK.
D. run big system with 1200000 atoms , Error.
It is confused me because it seems only a system that bigger than one million atoms will come across this error and small system is ok. At first i think maybe run out of my GPU’s memory. This i keep watching the condition of GPU, it just use 5.8/7.9GB. Due to i dont good at cuda programming, so i am sorry to bother you about this question. Hope you can answer me this question.

Thinks in advance for you selfless help!
Platform info:
E5 2683 * 2
128GB memory
centos 6.10
GTX 1080 with 8GB memory

LAMMPS (28 Feb 2019)

Here is Error info :

[root@…436… cpu]# mpirun -np 16 lmp_gpu -sf gpu -pk gpu 1 -in system.in
LAMMPS (28 Feb 2019)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:87)
using 1 OpenMP thread(s) per MPI task
Reading data file …
orthogonal box = (0 0 0) to (1010 1010 1010)
2 by 2 by 4 MPI processor grid
reading atoms …
1200000 atoms
scanning bonds …
1 = max bonds/atom
scanning angles …
1 = max angles/atom
scanning dihedrals …
1 = max dihedrals/atom
reading bonds …
1199400 bonds
reading angles …
1198800 angles
reading dihedrals …
1198200 dihedrals
Finding 1-2 1-3 1-4 neighbors …
special bond factors lj: 0 0 0
special bond factors coul: 0 0 0
2 = max # of 1-2 neighbors
2 = max # of 1-3 neighbors
4 = max # of 1-4 neighbors
6 = max # of special neighbors