tags:

views:

54

answers:

1

Hi, I'd like to handle directly 64-bit words on the CUDA platform (eg. uint64_t vars). I understand, however, that addressing space, registers and the SP architecture are all 32-bit based.

I actually found this to work correctly (on my CUDA cc1.1 card):

__global__ void test64Kernel( uint64_t *word )
{
    (*word) <<= 56;
}

but I don't know, for example, how this affects registers usage and the operations per clock cycle count.

A: 

Whether addresses are 32-bit or anything else does not affect what data types you can use. In your example you have a pointer (32-bit, 64-bit, 3-bit (!) - doesn't matter) to a 64-bit unsigned integer.

64-bit integers are supported in CUDA but of course for every 64-bit value you are storing twice as much data as a 32-bit value and so will use more registers and arithmetic operations will take longer (adding two 64-bit integers will just expand it out onto the smaller datatypes using carries to push into the next sub-word). The compiler is an optimising compiler, so will try to minimise the impact of this.

Note that using double precision floating point, also 64-bit, is only supported in devices with compute capability 1.3 or higher (i.e. 1.3 or 2.0 at this time).

Tom