I've been having problems in running simulations with pppm solver with
the user-cuda acceleration package on linux. Running the example
in.phosphate.cuda, I get:
terminate called after throwing an instance of 'cufftResult_t'
I googled it and found nothing. This kind of error appears in all my
scripts involving the pppm solver (lj/cut runs fine).
I compiled the cuda library successfully in double and single precisions
and using cufft=1 (whenever I used cufft=0, I wasn't able to compile the
main program). I added just the packages manybody, kspace and user-cuda.
I also used fftw2 and fftw3 for the Fourier transforms. I am using the
current 6-dec version of the lammps code.
My cuda toolkit is 5.0 and the driver is up-to-date.
Any thoughts on this issue would be gratefully appreciated.
I tested the Dec6 version and could not see anything going wrong using CUDA5 and 310.19 drivers. Could you send me a number of things:
(i) output of the crashing run
(ii) output of "nvidia-smi -a"
(iii) output of "nvcc --version"
LAMMPS (6 Dec 2012)
# Using LAMMPS_CUDA
USER-CUDA mode is enabled (lammps.cpp:393)
# CUDA: Activate GPU
Reading data file ...
orthogonal box = (33.0201 33.0201 33.0201) to (86.9799 86.9799 86.9799)
1 by 1 by 1 MPI processor grid
10950 atoms
10950 velocities
Replicating atoms ...
orthogonal box = (33.0201 33.0201 33.0201) to (194.899 194.899 194.899)
1 by 1 by 1 MPI processor grid
295650 atoms
PPPMCuda initialization ...
G vector = 0.210111
grid = 108 108 108
stencil order = 5
absolute RMS force accuracy = 0.000126177
relative force accuracy = 8.76251e-06
brick FFT buffer size/proc = 1520875 1259712 158700
rank 0 in job 113 ipe05_45050 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
(ii)
==============NVSMI LOG==============
Timestamp : Fri Dec 14 16:09:08 2012
Driver Version : 304.54
Attached GPUs : 1
GPU 0000:02:00.0
Product Name : GeForce GTX 580
Display Mode : N/A
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-4e627f89-c2bc-ea51-73a4-d94aa65f5af4
VBIOS Version : 70.10.60.00.82
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x02
Device : 0x00
Domain : 0x0000
Device Id : 0x108010DE
Bus Id : 0000:02:00.0
Sub System Id : 0x15803842
GPU Link Info
PCIe Generation
Max : N/A
Current : N/A
Link Width
Max : N/A
Current : N/A
Fan Speed : 40 %
Performance State : N/A
Clocks Throttle Reasons : N/A
Memory Usage
Total : 1535 MB
Used : 4 MB
Free : 1531 MB
Compute Mode : Default
(the rest is N/A)
(iii)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221
On the screen, the message "terminate called after throwing an instance
of 'cufftResult_t'" appears after the identification of the GPU card.
Hm,
if it is not too much hassle could you try updating to the latest drivers (310.19)? I saw at least one bug fixed in it (for another code unrelated to LAMMPS).
If that is not a good idea for you I can dry downgrading again, though funny enough the NVIDIA downloadside actually doesn't list your 304.54 driver. Maybe it was a buggy beta driver and they removed it again?
I updated the driver to 310.19, recompiled the cuda library and lammps
code and got the same error. Maybe it's important to tell you that I'm
using this set of libraries:
I was still not able to reproduce the error. Now I don't have a GTX580 right now (only a C2075 on the fermi side) so it might be a bug which only occurs on a particular hardware. Would it be possible to get a temporary account on your machine, to do some test directly on your machine?