[lammps-users] Re: lammps gpu package

_Matt_K_Petersen · March 4, 2010, 4:47pm

I see that the official lammps release for the gpu supported lj pair style needs a compute capability of 1.3 for double precision, but I'm wondering if there are other limitations for hardware with compute capability of 1.0. As I said in the forwarded email, I've begged some time on a tesla s870 cluster, which has a compute capability of 1.0.
Thanks,
Matt

----- Forwarded message from [email protected]... -----

Crozier_Paul_S · March 4, 2010, 6:14pm

Hi Matt. From Nvidia documentation, specs for the various compute capabilities are as follows:

*Specifications for Compute Capability 1.0
The maximum number of threads per block is 512;
The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512, and 64, respectively;
The maximum size of each dimension of a grid of thread blocks is 65535;
The warp size is 32 threads;
The number of registers per multiprocessor is 8192;
The amount of shared memory available per multiprocessor is 16 KB organized into 16 banks;
The total amount of constant memory is 64 KB;
The cache working set for constant memory is 8 KB per multiprocessor;
The cache working set for texture memory varies between 6 and 8 KB per multiprocessor;
The maximum number of active blocks per multiprocessor is 8;
The maximum number of active warps per multiprocessor is 24;
The maximum number of active threads per multiprocessor is 768;
For a one-dimensional texture reference bound to a CUDA array, the maximum width is 213;
For a one-dimensional texture reference bound to linear memory, the maximum width is 227;
For a two-dimensional texture reference bound to linear memory or a CUDA array, the maximum width 216 and the maximum height is 215;
For a three-dimensional texture reference bound to a CUDA array, the maximum width is 211, the maximum height is 211, and the maximum depth is 211;
The limit on kernel size is 2 million PTX instructions;
Each multiprocessor is composed of eight processors, so that a multiprocessor is able to process the 32 threads of a warp in four clock cycles.

*Specifications for Compute Capability 1.1
Support for atomic functions operating on 32-bit words in global memory.

*Specifications for Compute Capability 1.2
Support for atomic functions operating in shared memory and atomic functions operating on 64-bit words in global memory ;
Support for warp vote functions ;
The number of registers per multiprocessor is 16384;
The maximum number of active warps per multiprocessor is 32;
The maximum number of active threads per multiprocessor is 1024.

*Specifications for Compute Capability 1.3
Support for double-precision floating-point numbers.

So, yes, you'll need 1.3 in order to do double precision, and at least 1.1 to get the atomic functions.

Paul