Here is the result
NVIDIA: could not open the device file /dev/nvidiactl (No such file or
directory).
Failed to initialize NVML: Unknown Error
Well, maybe the toolkit I installed is too new (4.0). My driver is the
275.09.07 64 bits. I really don't know if they are compatible.
Luis
OK, I added the flag -fno-rtti and succeded with the compilation. I
tested the executable with this fix
fix gpuConf all gpu force 0 0 -1
and kpace_style pppm/gpu/single. Running on 6 processors, the following
message came:
Cuda driver error 100 in call at file 'geryon/nvd_device.h' in line
207.
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
Cuda driver error 100 in call at file 'geryon/nvd_device.h' in line
207.
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 4
Cuda driver error 100 in call at file 'geryon/nvd_device.h' in line
207.
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2
Cuda driver error 100 in call at file 'geryon/nvd_device.h' in line
207.
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
Cuda driver error 100 in call at file 'geryon/nvd_device.h' in line
207.
application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3
Cuda driver error 100 in call at file 'geryon/nvd_device.h' in line
207.
Hi
That is a common problem. Sometimes the device entries are not created.
This should help (as root):
mknod /dev/nvidia0 c 195 0 ; chmod 666 /dev/nvidia0
mknod /dev/nvidiactl c 195 255 ; chmod 666 /dev/nvidiactl
If you have more than one GPU you need more of the first lines with increasing numbers. E.g. for a second GPU you would need to do:
mknod /dev/nvidia1 c 195 1 ; chmod 666 /dev/nvidia1
You might need to add that to your bootup script.
Cheers
Christian
-------- Original-Nachricht --------
Here is the result
NVIDIA: could not open the device file /dev/nvidiactl (No such file or
directory).
well, there is your problem. due to the udev system,
the cuda devices get deleted at every reboot. you either
have to configure udev to create them for you with the proper
permissions.
the most convenient way to create those devices is to run
nvidia-smi -a as root. just add it to /etc/rc.d/rc.local
Failed to initialize NVML: Unknown Error
Well, maybe the toolkit I installed is too new (4.0). My driver is the
275.09.07 64 bits. I really don't know if they are compatible.
yes they are:
[[email protected]... gpu]$ ./nvc_get_devices
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA
CUDA Driver Version: 4.0
CUDA Runtime Version: 4.0
Device 0: "GeForce GTX 560 Ti"
Type of device: GPU
Compute capability: 2.1
Double precision support: Yes
Total amount of global memory: 0.999207 GB
Number of compute units/multiprocessors: 8
Number of cores: 256
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum group size (# of threads per block) 1024 x 1024 x 64
Maximum item sizes (# threads for each dim) 65535 x 65535 x 65535
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 1.645 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: Yes
Device has ECC support enabled: No
[[email protected]... gpu]$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 275.09.07 Wed Jun 8
14:16:46 PDT 2011
GCC version: gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)
Hi,
Now the nvidia-smi -a gives:
==============NVSMI LOG==============
Timestamp : Mon Jun 20 18:25:08 2011
Driver Version : 270.41.19
Attached GPUs : 1
GPU 0:3:0
Product Name : GeForce GT 240
Display Mode : N/A
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : N/A
Inforom Version
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
PCI
Bus : 3
Device : 0
Domain : 0
Device Id : CA310DE
Bus Id : 0:3:0
Fan Speed : 41 %
Memory Usage
Total : 1023 Mb
Used : 40 Mb
Free : 982 Mb
Compute Mode : Default
Utilization
Gpu : N/A
Memory : N/A
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : 33 C
Power Readings
Power State : N/A
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : N/A
SM : N/A
Memory : N/A
while running with fix gpu 0 0 1 and pppm/gpu/single. I have not obtained
any improvement on performance running on 1 or 2 cpu cores. I am running a
9216-atom system with buck/coul/long potential.
What should be the speedup in this case?
Best regards,
Luis
while running with fix gpu 0 0 1 and pppm/gpu/single. I have not obtained
any improvement on performance running on 1 or 2 cpu cores. I am running a
9216-atom system with buck/coul/long potential.
What should be the speedup in this case?
hard to say if there would be any speedup. your GPU model is pretty
limited in terms of memory bandwidth and floating point performance
(clock rate and number of cores). all of that has significant impact on
the amount of acceleration possible. please note that this pair style is
only supported for acceleration with the USER-CUDA package, not
the GPU package. does the output show which GPU package is active?
cheers,
axel.
I plan on using the pppm acceleration. Below is an extract of the output
I plan on using the pppm acceleration. Below is an extract of the output
that is a pretty pointless thing, if the majority of the time
is spent on computing the real-space interactions.
have a look at amdahl's law and be enlightened.
cheers,
axel.
I don't think that PPPM with GPU acceleration should be slower than a CPU run. I am happy to look at the screen output for both the CPU and GPU that includes GPU timings, however, there are some things you should think about before pursuing this:
1. pppm/gpu can provide significant speedups for cases where the k-space computational time represents a significant fraction of the total runtime. This will happen for certain simulations, mostly when using GPU acceleration for pair forces. If k-space is 10% of the runtime, however, the absolute best you can do is a 10% improvement in performance. k-space times for parallel jobs can be communication bound and gpu-acceleration will not help with this.
2. Since the pair time run on the CPU is 90% of the run-time, running the simulation on 2 cores should be faster even if for some reason the pppm/gpu time is a little slower - you are dividing >90% of the work between different CPU cores.
3. A new GPU is not going to help much if buck/coul/long is not available for GPU acceleration.
4. With only 9216 atoms, the speedup with a good GPU and a port for buck/coul/long versus a hex-core opteron will probably be between 2 and 3 times. I do believe LAMMPS can be improved for smaller problem sizes, however, I also think that this is unlikely to happen soon (at least with full feature compatibility).
- Mike
Luis Goncalves wrote:
Firstly, thank you all for the great help.
It appears to me that I am better off using the CUDA-USER package because
it will speedup my pair interaction portion. pppm/gpu will then help me
since the kspace timings will increase in comparison with the pairs
timings. Besides, my pair potential is already implemented in CUDA-USER.
My other concern is regarding multiple cores and gpu processing. If I
understood correctly, item 4 below means that
1 gpu + 1 cpu = (6 cpus) x 2 times faster
if I use gpu for pair interactions. Is that right?
Cheers,
Luis
This is a future option if you have a different GPU (compute capability
1.3 is required for user-cuda - yours is 1.2). Regarding your question,
I meant that the best time you could get on a 6-core opteron with GPU
would be 2-3x the best time you could get without. The "cuda" pkg is
entirely different code with different performance.
- Mike