views:

769

answers:

4

I'm interested in implementing an algorithm on the GPU using HLSL, but one of my main concerns is that I would like a variable level of precision. Are there techniques out there to emulate 64bit precision and higher that could be implemented on the GPU.

Thanks!

+6  A: 

GPUs are just beginning to support double precision in hardware, though it will continue to be much slower than single precision in the near future. There are a wide variety of techniques that have been developed over the years to synthesize higher-accuracy floating point using a representation composed of multiple floats in whatever precision has fast hardware support, but the overhead is pretty substantial. IIRC, the crlibm manual has a pretty good discussion of some of these techniques, with error analysis and pseudocode (CRLIBM uses them to represent numbers as more than one double-precision value, but the same techniques can be used with single)

Without knowing more about what you're trying to do, it's hard to give a better answer. For some algorithms, only one small part of the computation needs high accuracy; if you're in a case like that, it might be possible for you to get decent performance on the GPU, though the code won't necessarily be very pretty or easy to work with. If you need high precision pervasively throughout your algorithm, then the GPU probably isn't an attractive option for you at the moment.

Finally, why HLSL and not a compute-oriented language like CUDA or OpenCL?

Stephen Canon
The code is going to be for processing iterated fractal systems so it needs to have high precision consistently. I'm looking for a performance increase versus the processing on a CPU. And as far as CUDA and OpenCL, I'm just more familiar with HLSL atm. Though I'm considering doing it in CUDA. I've dabbled in CUDA before, but I can't say I'm anywhere near proficient.
Mark
If you need high precision consistently, it is likely impossible to beat well-written code running on the CPU at present. Your time is probably better spent profiling execution on the CPU and tuning performance there.
Stephen Canon
Not that writing GPGPU code isn't worthwhile on it's own merits, just that you really want to chose something where you won't be trying to make the hardware do something it's not designed to.
Stephen Canon
A: 

ATI's Stream SDK supports some native double precision, but it's not HLSL.

The catches are that:

  • not all GPUs have double precision hardware, only the higher-end cards like HD 4870
  • not all double precision operations are available. For example, no divide instruction.

OpenCL will support double precision as an extension, but that's still in beta.

Die in Sente
OpenCL isn't in beta; some individual implementations of the spec are in beta, but OpenCL is a standard, not a specific implementation. There is also a non-beta implementation for OSX in SnowLeopard.
Stephen Canon
@stephentyrone. You're right. But as far as I know all *implementations* of OpenCL that support double precision *on the GPU* (not the CPU) are still under development. I have no first-hand knowledge of what is and isn't supported in the SnowLeopard. If I'm misinformed, please post the details.
Die in Sente
+1  A: 

Using two floats (i.e. single precision values), you can achieve about 56-bits of precision. This approaches the precision of a double, but many of the operations you can implement for this "double single" data type are slow and are less precise than using doubles. However, for simple arithmetic operations, they are usually sufficient.

This paper talks a bit about the idea and describes how to implement the multiplication operation. For a more complete list of operations you can perform and how to implement them, check out the DSFUN90 package here. The package is written in Fortran 90, but can be translated to anything that has single precision numbers. Be aware though that you must license library from them to use it for non-commercial purposes. I believe the Mersenne-Twister CUDA demo application also has implementations for addition and multiplication operations.

Eric
A: 

This is a slightly off-topic answer, but if you want to see how your problem is going to be impacted by switching some operations to single-precision arithmetic, you should think about using interval arithmetic to empirically measure the uncertainty boundaries when you mix precision in various ways. Boost has an interval arithmetic library that I once used to instrument an existing C++ scientific code: it was quite easy to use.

But be warned: interval arithmetic is notoriously pessimistic: i.e. it sometimes exaggerates bounds. Affine arithmetic is supposed to be better, but I never found a usable library for that.

tramdas