All the encoders you list are byte oriented, and are thrown off by a few properties of doubles. For one there is the layout where the 12-bit exponent/sign doesn't really play well with byte boundaries, for other there's the noisiness of your input. The first part is easy to deal with in multitude of ways, the second will limit the effectiveness of any lossless compression that you throw at it. I think that even the best result will be less than amazing, i don't know your data but i'd suspect you can count on mere 25% save, more or less.
From the top of my head, and perhaps useless because you have thought of everything on this list...
Treat the stream as 64-bit integers and delta-encode adjacent values. If you have runs of values with the same exponent, it will effectively zero it out, as well as possibly some high mantissa bits. There will be overflows, but the data still needs only 64 bits and the operation can be reveresed.
At this stage you can optionally try some crude integer prediction, and save differences.
If you have followed the suggestion before, you will have almost half values starting with 000... and almost half with FFF... To eliminate that, rotate the value to the left (ROL) by 1 bit and XOR it with all Fs if the current LSB is 1. Reverse is XOR with Fs if LSB is 0 then ROR.
On the second thought simply XORing predictions to true values can be better than difference, because you don't have to do step 3 then.
You can try reordering bytes to group bytes with same significance together. Like, first all most significant bytes, and so on. At the very least you should get something like a massive run of zeroes with at most few bits of noise first.
Run through a generic compressor or even first RLE on the run of zeroes, then an entropy encoder, like huffman, or better, range encoder from 7zip/LZMA.
There is one good thing about your data, it is monotonous. There is a bad thing about your data: it's simply too small a set. How much do you want to save, mere kilobyes? what for? The compression effectiveness will suffer a lot if there is often exponent difference between adjacent values.
If you are processing large number of those data sets, you should consider using their similarity to compress them together better - perhaps interleave them at some stage. If you can live with some loss, zeroing out some least significant bytes might be a good idea - perhaps both on source data and on prediction so that you don't reintroduce noise there.