Hi,
I encountered some errors while running the sample input script - in.rhodo.cuda - with USER-CUDA package, but I can run the other two sample script (in.eam.cuda, in.lj.cuda) normally.
I just modified the precision to mix precision and the other settings left default.
- Command:
mpirun -np 1 ./cuda_mix -sf cuda -v g 1 -v x 1 -v y 1 -v z 1 -v t 100 < in.rhodo.cuda
- Error message:
Using device 0: Quadro 6000
Cuda error: FixShakeCuda_Shake: Kernel execution failed in file ‘fix_shake_cuda.cu’ in line 156 : unspecified launch failure.
- Output:
LAMMPS (4 Jul 2012)
Using LAMMPS_CUDA
USER-CUDA mode is enabled (lammps.cpp:396)
using 1 OpenMP thread(s) per MPI task
CUDA: Activate GPU
Scanning data file …
4 = max bonds/atom
8 = max angles/atom
18 = max dihedrals/atom
2 = max impropers/atom
Reading data file …
orthogonal box = (-27.5 -38.5 -36.3646) to (27.5 38.5 36.3615)
1 by 1 by 1 MPI processor grid
32000 atoms
32000 velocities
27723 bonds
40467 angles
56829 dihedrals
1034 impropers
Finding 1-2 1-3 1-4 neighbors …
4 = max # of 1-2 neighbors
12 = max # of 1-3 neighbors
24 = max # of 1-4 neighbors
26 = max # of special neighbors
Replicating atoms …
orthogonal box = (-27.5 -38.5 -36.3646) to (27.5 38.5 36.3615)
1 by 1 by 1 MPI processor grid
32000 atoms
27723 bonds
40467 angles
56829 dihedrals
1034 impropers
Finding 1-2 1-3 1-4 neighbors …
4 = max # of 1-2 neighbors
12 = max # of 1-3 neighbors
24 = max # of 1-4 neighbors
26 = max # of special neighbors
Finding SHAKE clusters …
1617 = # of size 2 clusters
3633 = # of size 3 clusters
747 = # of size 4 clusters
4233 = # of frozen angles
PPPMCuda initialization …
G vector = 0.248831
grid = 25 32 32
stencil order = 5
absolute RMS force accuracy = 0.025142
relative force accuracy = 7.57143e-05
brick FFT buffer size/proc = 41070 25600 12321
WARNING: # CUDA: You asked for the usage of Coulomb Tables. This is not supported in CUDA Pair forces. Setting is ignored.
(pair_lj_charmm_coul_long_cuda.cpp:171)
CUDA: VerletCuda::setup: Allocate memory on device for maximum of 32000 atoms…
CUDA: Using precision: Global: 4 X: 8 V: 8 F: 4 PPPM: 4
Setting up run …
CUDA: VerletCuda::setup: Upload data…
Test TpA
Test BpA
CUDA: Timing of parallelisation layout with 10 loops:
CUDA: BpA TpA
16.827072 18.359028
CUDA: Total Device Memory useage post setup: 168.070312 MB
Memory usage per processor = 98.3832 Mbytes
---------------- Step 0 ----- CPU = 0.0000 (sec) ----------------
TotEng = -25356.1745 KinEng = 21444.8303 Temp = 299.0397
PotEng = -46801.0048 E_bond = 2537.9940 E_angle = 10921.3742
E_dihed = 5211.7865 E_impro = 213.5116 E_vdwl = -2307.8633
E_coul = 207021.6923 E_long = -270399.5001 Press = -142.5990
Volume = 307995.0335
========= CUDA-MEMCHECK
========= Invalid global read of size 4
========= at 0x00008a60 in FixShakeCuda_Shake_Kernel
========= by thread (0,0,0) in block (82,0,0)
========= Address 0x0000cb74 is out of bounds