I added all of the packages except for meam & reax, and it compiled
successfully, and ran the normal examples. But...
When I try to run the gpulammps example for gb.in, it freezes my computer
somewhat. I can still move the mouse, but I can't update the display in any
other way.
I use
mpirun -np 2 lammps < gb.in
It gets as far as compiling the GPU program, but that's where it gets stuck.
Is there anything that I'm doing wrong?
if you are in graphics mode, you have competition between
sofware wanting to update the display (and using the GPU
for it) and your GPU computing requests. even with a very
powerful GPU, running a CUDA code, makes the graphics
go slower, sometimes a lot. the nvidia driver typicallly has
a timeout setting that will kill the CUDA job if it doesn't
free the GPU fast enough. thus i would do my experiments
in text mode rather than in graphics mode.
also, you should not oversubscribe the GPU, i.e. don't
try to use more MPI tasks than you have GPUs.
you have only 16 cores of 1.1 compute capability,
little memory, and a stripped down mobile GPU to
boot. there is not much performance to be expected.
if a gay-berne system would be 2x faster on the GPU
than on the CPU that would probably be a good
performance.
just compare your GPU with the specs from the previous
generation tesla card that i have here. there are already
15x more cores.
axel.
Found 1 platform(s).
Using platform: NVIDIA Corporation NVIDIA CUDA
CUDA Driver Version: 3.20
CUDA Runtime Version: 3.20
Device 0: "Tesla C1060"
Type of device: GPU
Compute capability: 1.3
Double precision support: Yes
Total amount of global memory: 3.99982 GB
Number of compute units/multiprocessors: 30
Number of cores: 240
Total amount of constant memory: 65536 bytes
Total amount of local/shared memory per block: 16384 bytes
Total number of registers available per block: 16384
Warp size: 32
Maximum number of threads per block: 512
Maximum group size (# of threads per block) 512 x 512 x 64
Maximum item sizes (# threads for each dim) 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 256 bytes
Clock rate: 1.296 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default
Concurrent kernel execution: No
Device has ECC support enabled: No
if you compare this with a 1.5 year old high-end desktop