views:

115

answers:

2

I ran across an interesting issue in some of my "humanize_bytes()" code. This loop represents the issue without all the other logic. The loop needs to stop when the bytes have been truncated to a "human readable" level. It keep iterating until the final value is less than 1024 (or specifiable bytesize).

I started looking into the problem when the function output "1024.0 P" for 1024 petabytes. At first I thought I accidentally use a <= vs <, but upon further inspection I found that something more interesting was happening.

This code reproduces the problem. I'm using perl 5.8.8.

use strict;

my $bytesize = 1024;
my $final = 1152921504606846720;
while (1) {
    printf "bytesize %%d: %d %%f: %s %s final %%d: %19d %%f: %26f\n",
        $bytesize,$bytesize,
        (
            $bytesize == $final ? '==' :
            $bytesize > $final  ? '>'  :
            $bytesize < $final  ? '<'  :
            '<error>'
        ),
        $final,$final;
    last if $final < $bytesize;
    $final /= $bytesize;
}
printf "final = bytesize d:%d f:%s %s final d:%d f:%f\n",
    $bytesize,$bytesize,
    (
        $bytesize == $final ? '==' :
        $bytesize > $final  ? '>'  :
        $bytesize < $final  ? '<'  :
        '<error>'
    ),
    $final,$final;

The output I receive is:

bytesize %d: 1024 %f: 1024 < final %d: 1152921504606846720 %f: 1152921504606846720.000000
bytesize %d: 1024 %f: 1024 < final %d:    1125899906842623 %f:    1125899906842623.750000
bytesize %d: 1024 %f: 1024 < final %d:       1099511627775 %f:       1099511627775.999756
bytesize %d: 1024 %f: 1024 < final %d:          1073741823 %f:          1073741824.000000
bytesize %d: 1024 %f: 1024 < final %d:             1048575 %f:             1048576.000000
bytesize %d: 1024 %f: 1024 > final %d:                1023 %f:                1024.000000
final = bytesize d:1024 f:1024 > final d:1023 f:1024.000000

The thing to notice here is that the final value in decimal is 1023, but in float its 1024. How can that be? And obviously, perl is using the decimal representation.

+2  A: 

Perl appears to be rounding the value passed to %f. If you wrap your value in int($final) you will get 1023 output indicating that the %d operator is doing the right thing (always rounding down to the nearest integer).

PP
+10  A: 

Your original value is not 1024 petabytes, it is 256 less. (I noticed this by popping it into dc(1) and printing it in hex: 0xFFFFFFFFFFFFF00.)

Consequently, each time through the loop your number is slightly different than you are expecting, and in the end it is slightly smaller.

If you had more precision, you would end up with

1023.999999999999772626324556767940

This naturally truncates to 1023 and rounds to 1024.

DigitalRoss
While you are also right, I think PP's answer better describes what's going on with the different format strings involved which is more towards what my question was asking.
xyld
You should also be aware that 1 exabyte requires 61 bits to represent, and an IEEE754 double has only 53 bits of precision, counting the hidden bit.
DigitalRoss