GPUs are just beginning to support double precision in hardware, though it will continue to be much slower than single precision in the near future. There are a wide variety of techniques that have been developed over the years to synthesize higher-accuracy floating point using a representation composed of multiple floats in whatever precision has fast hardware support, but the overhead is pretty substantial. IIRC, the crlibm manual has a pretty good discussion of some of these techniques, with error analysis and pseudocode (CRLIBM uses them to represent numbers as more than one double-precision value, but the same techniques can be used with single)
Without knowing more about what you're trying to do, it's hard to give a better answer. For some algorithms, only one small part of the computation needs high accuracy; if you're in a case like that, it might be possible for you to get decent performance on the GPU, though the code won't necessarily be very pretty or easy to work with. If you need high precision pervasively throughout your algorithm, then the GPU probably isn't an attractive option for you at the moment.
Finally, why HLSL and not a compute-oriented language like CUDA or OpenCL?