Dear all.
I keep on fighting with my GTX680 and lammps. I changed my linux distribution to Scientific Linux 6.3 and now I have the two libraries compiled (GPU & USER-CUDA). My first problem concerns to the GPU library. I’m able to run benchmark problems but when I make bigger the simulation box, lammps get stuck after the GPU initialization phase. Here is some info about the context of the problem:
File: lammps/examples/gpu/in.gpu.rhodo
Instance of lammps: mpirun -np 8 /home/ekhi/bin/lmp_openmpi.gea.gpu < in.gpu.rhodo > out.gpu.rhodo.8
I paste the content of out.gpu.rhodo:
LAMMPS (13 Sep 2012)
<small>Scanning data file ...</small>
<small> 4 = max bonds/atom</small>
<small> 18 = max angles/atom</small>
<small> 40 = max dihedrals/atom</small>
<small> 4 = max impropers/atom</small>
<small>Reading data file ...</small>
<small> orthogonal box = (-27.5 -38.5 -36.2676) to (27.5 38.5 36.2645)</small>
<small> 2 by 2 by 2 MPI processor grid</small>
<small> 32000 atoms</small>
<small> 32000 velocities</small>
<small> 27723 bonds</small>
<small> 40467 angles</small>
<small> 56829 dihedrals</small>
<small> 1034 impropers</small>
<small>Finding 1-2 1-3 1-4 neighbors ...</small>
<small> 4 = max # of 1-2 neighbors</small>
<small> 12 = max # of 1-3 neighbors</small>
<small> 24 = max # of 1-4 neighbors</small>
<small> 26 = max # of special neighbors</small>
<small>Replicating atoms ...</small>
<small> orthogonal box = (-27.5 -38.5 -36.2676) to (82.5 115.5 108.797)</small>
<small> 2 by 2 by 2 MPI processor grid</small>
<small> 256000 atoms</small>
<small> 221784 bonds</small>
<small> 323736 angles</small>
<small> 454632 dihedrals</small>
<small> 8272 impropers</small>
<small>Finding 1-2 1-3 1-4 neighbors ...</small>
<small> 4 = max # of 1-2 neighbors</small>
<small> 12 = max # of 1-3 neighbors</small>
<small> 24 = max # of 1-4 neighbors</small>
<small> 26 = max # of special neighbors</small>
<small>Finding SHAKE clusters ...</small>
<small> 12936 = # of size 2 clusters</small>
<small> 29064 = # of size 3 clusters</small>
<small> 5976 = # of size 4 clusters</small>
<small> 33864 = # of frozen angles</small>
<small>PPPM initialization ...</small>
<small> G vector (1/distance)= 0.245959</small>
<small> grid = 48 64 60</small>
<small> stencil order = 5</small>
<small> estimated absolute RMS force accuracy = 0.0410392</small>
<small> estimated relative force accuracy = 0.000123588</small>
<small> using double precision FFTs</small>
<small> brick FFT buffer size/proc = 37555 24576 11655</small>
<small>
--------------------------------------------------------------------------</small>
<small>- Using GPGPU acceleration for pppm:</small>
<small>- with 8 proc(s) per device.</small>
<small>--------------------------------------------------------------------------</small>
<small>GPU 0: GeForce GTX 680, 1536 cores, 1.6/2 GB, 0.71 GHZ (Double Precision)</small>
<small>--------------------------------------------------------------------------</small>
<small>
Initializing GPU and compiling on process 0...Done.</small>
<small>Initializing GPU 0 on core 0...Done.</small>
<small>Initializing GPU 0 on core 1...Done.</small>
<small>Initializing GPU 0 on core 2...Done.</small>
<small>Initializing GPU 0 on core 3...Done.</small>
<small>Initializing GPU 0 on core 4...Done.</small>
<small>Initializing GPU 0 on core 5...Done.</small>
<small>Initializing GPU 0 on core 6...Done.</small>
<small>Initializing GPU 0 on core 7...Done.</small>
<small>--------------------------------------------------------------------------</small>
<small>- Using GPGPU acceleration for lj/charmm/coul/long:</small>
<small>- with 8 proc(s) per device.</small>
<small>--------------------------------------------------------------------------</small>
<small>GPU 0: GeForce GTX 680, 1536 cores, 1.4/2 GB, 0.71 GHZ (Double Precision)</small>
<small>--------------------------------------------------------------------------</small>
<small>
Initializing GPU and compiling on process 0...Done.</small>
<small>Initializing GPU 0 on core 0...Done.</small>
<small>Initializing GPU 0 on core 1...Done.</small>
<small>Initializing GPU 0 on core 2...Done.</small>
<small>Initializing GPU 0 on core 3...Done.</small>
<small>Initializing GPU 0 on core 4...Done.</small>
<small>Initializing GPU 0 on core 5...Done.</small>
<small>Initializing GPU 0 on core 6...Done.</small>
<small>Initializing GPU 0 on core 7...Done.</small>
<small>
Setting up run ...</small>
And after that, the systems get stuck and consuming CPU resources.
This error can be replicated with the /bench/GPU/cases, but when i make a simulation box small enough, the case run well…
Any help?
Thanks a lot.