views:

352

answers:

5

For a simple project I have to make large numbers (e.g. 4294967123) readable, so I'm writing only the first digits with a prefix (4294967123 -> 4.29G, 12345 -> 12.34K etc.)

The code (simplified) looks like this:

const char* postfixes=" KMGT";
char postfix(unsigned int x)
{
     return postfixes[(int) floor(log10(x))];
}

It works, but I think that there's a more elegant/better solution than computing the full precision logarithm, rounding it and casting it down to an int again.

Other solutions I thought of:

int i=0;
for(; x >= 1000 ; ++i) x/=1000;
return postfixes[i];

(This is significantly slower, but easier to read)

The numbers are distributed between according to Benford's Law and the number should be treated as unsigned 64 bit-number, as there should be no rounding error near 10^x (e.g. in python math.log(1000,10) returns 2.999996, which gets floored to 2). Is there any fast, accurate other way I'm missing?

A: 

Convert the number into a string and use the strings length. This is certainly not faster, but will be very accurate. You can then go on and use the string directly to build the result by slicing it appropriately.

David Schmitt
what about rounding up or down at the significant point?
paul
A: 

This is the most straightforward and simple method i can think of ... and maybe it will be a bit faster than computing the logarithm:

postfixes = {{1e12, "T"},
             {1e9,  "G"},
             {1e6,  "M"},
             {1e3,  "K"}}

for each postfix in postfixes{
    if(x > postfix.value){
        return (x / postfix.value) + postfix.letter;
    }
}

return x;
cube
+1  A: 

Don't fiddle with the number, instead s(n)printf the number into a string using "%E", then substitute appropriately for E+00 E+03 E+09 (etc) (IIRC, you should only get powers of 3 with scientific notation - which is what you want).

char number_buff[30];
snprintf(number_buff, 29, "%E", x);
char *powered_number_string = substitute_powers(number_buff);

char *substitute_powers(const char *number_buff) is messy in C.

sed would be something like

-e s/E+0// -e s/E+3/K/ -e s/E+6/M/ -e s/E+9/G/

it works, however it is way slower than the for-loop and it it's quite hard to understand in the beginning.
tstenner
+7  A: 

Your log10/floor code is perfectly readable, and its performance cost will likely be dwarfed by that of the string formatting you will subsequently be doing on your output.

However, suppose you were to really need the performance...

Note that log10(x) == log2(x) / log2(10) == log2(x) * 1/log2(10)

1/log2(10) is a constant

log2(x) can usually be performed cheaply in the integer pipeline on modern architectures using instructions such as CLZ or a bit twiddling hack, yielding a number between 0 and 63 for a 64-bit integer. That fits in 6 bits, leaving us up to 58 bits after the radix point usable for fixed point arithmetic in a 64-bit type.

So we can then use fixed-point arithmetic to find the log10:

unsigned long long integer_log10( unsigned long long _in )
{
    unsigned long long log10fp6x58 = 0x134413509f79ff0llu; // (unsigned long long) (double(1llu<<58) / log2(10.0))
    return (((integer_log2(_in)) * log10fp6x58)+(1llu<<57)) >> 58;
}

The implementation of integer_log2 is compiler/platform-dependent; e.g. on GCC/PowerPC, it's

unsigned long long integer_log2( unsigned long long _in )
{
    return 63 - __cntlzd(_in);
}

This approach can be generalised for finding the logarithm of any base, simply calculate the appropriate constant as described above.

moonshadow
It'll be probably better to change:unsigned long long log10fp6x58 =>static const unsigned long long log10fp6x58
Drakosha
A: 

First of all, should you need to format a zero, you don't want to be taking the logarithm of that. Second, you want something pretty, so you don't want, for example, "1000M" for 999,800,000. Third, you probably want rounding.

I suggest that you use something like this pseudocode:


function format(long x by value)
int p=5, char suf
if x<100000 then return string(x)
if x>=10000000000000 then
   x/=100000000
   p+=8
if x>=1000000000 then
   x/=10000
   p+=4
if x>=10000000 then
   x/=100
   p+=2
if x>=1000000 then
   x/=10
   p+=1
x+=5
if x>=100000 then
   x/=10
   p+=1
switch(p/3)
   6: suf='E'
   5: suf='P'
   4: suf='T'
   3: suf='G'
   2: suf='M'
   1: suf='K'
switch(p mod 3)
   2: return format("000 A",x/1000,suf)
   1: return format("00.0 A",x/10000,(x%10000)/100,suf)
   0: return format("0.00 A",x/100000,(x%100000)/100,suf)
end function
Robert L