views:

161

answers:

4

Hello all,

in Redis (http://code.google.com/p/redis) there are scores associated to elements, in order to take this elements sorted. This scores are doubles, even if many users actually sort by integers (for instance unix times).

When the database is saved we need to write this doubles ok disk. This is what is used currently:

  snprintf((char*)buf+1,sizeof(buf)-1,"%.17g",val);

Additionally infinity and not-a-number conditions are checked in order to also represent this in the final database file.

Unfortunately converting a double into the string representation is pretty slow. While we have a function in Redis that converts an integer into a string representation in a much faster way. So my idea was to check if a double could be casted into an integer without lost of data, and then using the function to turn the integer into a string if this is true.

For this to provide a good speedup of course the test for integer "equivalence" must be fast. So I used a trick that is probably undefined behavior but that worked very well in practice. Something like that:

double x = ... some value ...
if (x == (double)((long long)x))
    use_the_fast_integer_function((long long)x);
else
    use_the_slow_snprintf(x);

In my reasoning the double casting above converts the double into a long, and then back into an integer. If the range fits, and there is no decimal part, the number will survive the conversion and will be exactly the same as the initial number.

As I wanted to make sure this will not break things in some system, I joined #c on freenode and I got a lot of insults ;) So I'm now trying here.

Is there a standard way to do what I'm trying to do without going outside ANSI C? Otherwise, is the above code supposed to work in all the Posix systems that currently Redis targets? That is, archs where Linux / Mac OS X / *BSD / Solaris are running nowaday?

What I can add in order to make the code saner is an explicit check for the range of the double before trying the cast at all.

Thank you for any help.

+1  A: 

I don't see a problem with the casts, as long as x is within the range of long long. Maybe you should check out the modf() function which separates a double into its integral and fractional part. You can then add checks against (double)LLONG_MIN and (double)LLONG_MAX for the integral part to make sure. Though there may be difficulties with the precision of double.

But before doing anything of this, have you made sure it actually is a bottleneck by measuring its performance? And is the percentage of integer values high enough that it would really make a difference?

Secure
Thank you very much, this is already implemented and lead to a 2x speed in saving DBs with many doubles. The optimization started after a profiling session where the snprintf() function showed to be very slow...
antirez
+4  A: 

Perhaps some old fashion fixed point math could help you out. If you converted your double to a fixed point value, you still get decimal precision and converting to a string is as easy as with ints with the addition of a single shift function.

Another thought would be to roll your own snprintf() function. Doing the conversion from double to int is natively supported by many FPU units so that should be lightning fast. Converting that to a string is simple as well.

Just a few random ideas for you.

Michael Dorgan
Thanks Michael, wow, FPU support this conversion? This is a great news indeed. Also the trick of separating the parts and printing them independently is cool. Thanks this is very helpful.
antirez
+2  A: 

The problem with doing that is that the comparisons won't work out the way you'd expect. Just because one floating point value is less than another doesn't mean that its representation as an integer will be less than the other's. Also, I see you comparing one of the (former) double values for equality. Due to rounding and representation errors in the low-order bits, you almost never want to do that.

If you are just looking for some kind of key to do something like hashing on, it would probably work out fine. If you actually care about which values really have greater or lesser value, its a bad idea.

T.E.D.
Yes, I noticed the double comparison for equality too. It can be the source of nasty difficult to find problems. It will work 99 times out of 100.
Jim Tshr
Hello Ted, if you check the code I'm always comparing doubles actually, but one double is obtained after a two-step casting. So the idea is, if the double still matches, then it was able to pass that "filter" without loosing information. So the long representation of it can be printed instead of itself.So the reason for doing this is just to gain speed in the double -> string conversion stage.
antirez
Reading about comparing doubles, this does not work even when the numbers are exactly the same representation wise?I understand that if two numbers are generated from string representations or by other math processing, then the comparison may fail while the numbers are still epsilon-wise the same, but in my specific case, I can get a false negative without problems, as I'll resort to the sane code of using snprintf().if I understand the issue correctly the problem comparing doubles is false negatives, not false positives.
antirez
A: 

Your test is perfectly fine (assuming you have already separately handled infinities and NANs by this point) - and it's probably one of the very few occaisions when you really do want to compare floats for equality. It doesn't invoke undefined behaviour - even if x is outside of the range of long long, you'll just get an "implementation-defined result", which is OK here.

The only fly in the ointment is that negative zero will end up as positive zero (because negative zero compares equal to positive zero).

caf
Thanks caf, I studied a bit floating point numbers representations in this latest hours. I also think this is safe. Yes, there are explicit checks for Nan and Infinity already in the code, so should be safe.Just to make things looking a bit safer, I added #if to check for mantis and long long precisions to match, so the code is only compiled if double has at least 52 bit of precision, then there is an explicit test in order to check if the double is inside the range where a long long will not overflow (and this test is done in a 52 bit range so we are guaranteed it will work). Thx for reply.
antirez
The range testing is unnecessary - you could even do it with `char` instead if you wanted to. The implementation-defined result that you get when the `double` is out of range simply won't compare equal when converted back to `double` ("overflow" in C only happens on calculations, not conversions).
caf