GPU run does not give the correct Eneregy

Dear All,

I ran the example found in lammps/examples/melt and changed the file
in.melt so that it can run on a gpu:

My first concern would be that your CPU output does not match the log in
the examples directory or my cpu output. Do you know why this is?

My last LAMMPS update was yesterday and I am running a newer version of
the GPU lib. The cpu and gpu output matches and they match the log file in
the examples directory.

- Mike

I change the line "velocity all create 3.0 87287" to "velocity
      all create 1.0 87287". The CPU output should be:

LAMMPS (31 Mar 2011)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (16.796 16.796 16.796)
  1 by 1 by 1 processor grid
Created 4000 atoms
Setting up run ...
Memory usage per processor = 2.35378 Mbytes
Step Temp E_pair E_mol TotEng Press
       0 3 -6.7733681 0 -2.2744931 -3.7033504
      50 1.6758903 -4.7955425 0 -2.2823355 5.670064
     100 1.6458363 -4.7492704 0 -2.2811332 5.8691042
     150 1.6324555 -4.7286791 0 -2.280608 5.9589514
     200 1.6630725 -4.7750988 0 -2.2811136 5.7364886
     250 1.6275257 -4.7224992 0 -2.281821 5.9567365
Loop time of 1.35566 on 1 procs for 250 steps with 4000 atoms

Pair time (\) = 1\.10385 \(81\.425\) Neigh time \() = 0.119103 (8.78562)
Comm time (\) = 0\.0438032 \(3\.23114\) Outpt time \() = 0.0291114 (2.1474)
Other time (%) = 0.0597966 (4.41088)

Nlocal: 4000 ave 4000 max 4000 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 5499 ave 5499 max 5499 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 151513 ave 151513 max 151513 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 151513
Ave neighs/atom = 37.8783
Neighbor list builds = 12
Dangerous builds = 0

The simulation box just blows up. here is the output

=======================================
LAMMPS (31 Mar 2011)
Lattice spacing in x,y,z = 1.6796 1.6796 1.6796
Created orthogonal box = (0 0 0) to (16.796 16.796 16.796)
1 by 1 by 1 processor grid
Created 4000 atoms

--------------------------------------------------------------------------
- Using GPGPU acceleration for lj/cut:
- with 1 procs per device.
--------------------------------------------------------------------------
GPU 0: GeForce GTX 260, 216 cores, 0.81/0.87 GB, 1.3 GHZ (Mixed Precision)
--------------------------------------------------------------------------

Initializing GPU and compiling on process 0...Done.
Initializing GPU 0 on core 0...Done.

Setting up run ...
Memory usage per processor = 2.44226 Mbytes
Step Temp E_pair E_mol TotEng Press
0 3 -4804477.8 0 -4804473.3 -1882451.4
ERROR: Lost atoms: original 4000 current 327
Cuda driver error 4 in call at file 'geryon/nvd_timer.h' in line 83.
*** An error occurred in MPI_Abort
*** after MPI was finalized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[gpu0:20559] Abort before MPI_INIT completed successfully; not able to
guarantee that all other processes were killed!

it works for me:

it must be how I compiled /lib/gpu then. I used the Makefile:

Still no problem here. What version of CUDA driver, runtime reported by
nvc_get_devices and what arch flag are you using in the Makefile. To help
me figure this out, try changing the fix to be

fix 0 all gpu force 0 0 1

and see if that works.

- Mike

I did fix 0 all gpu force 0 0 1 already and it still give blows up.

nvc_get_devices shows: