I see that the official lammps release for the gpu supported lj pair style needs a compute capability of 1.3 for double precision, but I'm wondering if there are other limitations for hardware with compute capability of 1.0. As I said in the forwarded email, I've begged some time on a tesla s870 cluster, which has a compute capability of 1.0.

Thanks,

Matt

----- Forwarded message from [email protected] -----

Hi Matt. From Nvidia documentation, specs for the various compute capabilities are as follows:

*Specifications for Compute Capability 1.0

The maximum number of threads per block is 512;

The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512, and 64, respectively;

The maximum size of each dimension of a grid of thread blocks is 65535;

The warp size is 32 threads;

The number of registers per multiprocessor is 8192;

The amount of shared memory available per multiprocessor is 16 KB organized into 16 banks;

The total amount of constant memory is 64 KB;

The cache working set for constant memory is 8 KB per multiprocessor;

The cache working set for texture memory varies between 6 and 8 KB per multiprocessor;

The maximum number of active blocks per multiprocessor is 8;

The maximum number of active warps per multiprocessor is 24;

The maximum number of active threads per multiprocessor is 768;

For a one-dimensional texture reference bound to a CUDA array, the maximum width is 213;

For a one-dimensional texture reference bound to linear memory, the maximum width is 227;

For a two-dimensional texture reference bound to linear memory or a CUDA array, the maximum width 216 and the maximum height is 215;

For a three-dimensional texture reference bound to a CUDA array, the maximum width is 211, the maximum height is 211, and the maximum depth is 211;

The limit on kernel size is 2 million PTX instructions;

Each multiprocessor is composed of eight processors, so that a multiprocessor is able to process the 32 threads of a warp in four clock cycles.

*Specifications for Compute Capability 1.1

Support for atomic functions operating on 32-bit words in global memory.

*Specifications for Compute Capability 1.2

Support for atomic functions operating in shared memory and atomic functions operating on 64-bit words in global memory ;

Support for warp vote functions ;

The number of registers per multiprocessor is 16384;

The maximum number of active warps per multiprocessor is 32;

The maximum number of active threads per multiprocessor is 1024.

*Specifications for Compute Capability 1.3

Support for double-precision floating-point numbers.

So, yes, you'll need 1.3 in order to do double precision, and at least 1.1 to get the atomic functions.

Paul