views:

1100

answers:

11

I have some code that does a lot of comparisons of 64-bit integers, however it must take into account the length of the number, as if it was formatted as a string. I can't change the calling code, only the function.

The easiest way (besides .ToString().Length) is:

(int)Math.Truncate(Math.Log10(x)) + 1;

However that performs rather poorly. Since my application only sends positive values, and the lengths are rather evenly distributed between 2 and 9 (with some bias towards 9), I precomputed the values and have if statements:

static int getLen(long x) {
    if (x < 1000000) {
        if (x < 100) return 2;
        if (x < 1000) return 3;
        if (x < 10000) return 4;
        if (x < 100000) return 5;
        return 6;
    } else {
        if (x < 10000000) return 7;
        if (x < 100000000) return 8;
        if (x < 1000000000) return 9; 
        return (int)Math.Truncate(Math.Log10(x)) + 1; // Very uncommon
    }
}

This lets the length be computed with an average of 4 compares.

So, are there any other tricks I can use to make this function faster?

Edit: This will be running as 32-bit code (Silverlight).

Update:

I took Norman's suggestion and changed the ifs around a bit to result in an average of only 3 compares. As per Sean's comment, I removed the Math.Truncate. Together, this boosted things about 10%. Thanks!

+5  A: 

Two suggestions:

  1. Profile and put the common cases first.
  2. Do a binary search to minimize the number of comparions in the worst case. You can decide among 8 alternatives using exactly three comparisons.

This combination probably doesn't buy you much unless the distribution is very skew.

Norman Ramsey
Nice. I laid out the ifs differently and that helped a bit. Thanks!
MichaelGG
There's not much skewing, but re-organizing the ifs removed a compare on average.
MichaelGG
A: 

not sure if this is faster or not.. but you can always count...

static int getLen(long x) {
    int len = 1;
    while (x > 0) {
        x = x/10;
        len++
    };
    return len
}
Tracker1
A: 

What about this?

int MAX_VALUE = //set this to the max value of an integer on your platform
int exponent = 1;

while (exponent < MAX_VALUE)
{
    if (x < 10)
       return 1;
    if (x < 10 ^ exponent)
       return (exponent + 1); 
}
amdfan
10 ^ exponent is too slow, it would be better to use some sort of bit-shifting.
Karl
for(i=10;value<i;i=(i<<3)+(i<<1)) {...} could be faster.
liori
A: 

What do you mean by length? Number of zeros or everything? This does significant figures, but you get the general idea

public static string SpecialFormat(int v, int sf)  
{  
     int k = (int)Math.Pow(10, (int)(Math.Log10(v) + 1 - sf));  
     int v2 = ((v + k/2) / k) * k;  
     return v2.ToString("0,0");  
}
Chris S
Yea, complete length as if you just did ToString().Length. The problem is that calling Math.Log10 kills performance. Just using the Log10 way results in code that's many times slower.
MichaelGG
+1  A: 

Here's a binary-search version, which I have tested, which works on 64-bit integers using exactly five comparisons each time.

int base10len(uint64_t n) {
  int len = 0;
  /* n < 10^32 */
  if (n >= 10000000000000000ULL) { n /= 10000000000000000ULL; len += 16; }
  /* n < 10^16 */
  if (n >= 100000000) { n /= 100000000; len += 8; }
  /* n < 100000000 = 10^8 */
  if (n >= 10000) { n /= 10000; len += 4; }
  /* n < 10000 */
  if (n >= 100) { n /= 100; len += 2; }
  /* n < 100 */
  if (n >= 10) { return len + 2; }
  else         { return len + 1; }
}

I doubt this is going to be any faster than what you're already doing. But it's predictable.

Norman Ramsey
I think the divides kill it -- it goes about twice as slow as the pure if/return way.
MichaelGG
I'm not surprised. The algorithm is designed for log base 2, in which case the divides can be replaced by shifts, which are typically much faster. But it sure is pretty :-)
Norman Ramsey
A: 

You commented in code that 10 digits or more is very uncommon, so your original solution is not bad

GogaRieger
Yea, it's not _bad_ per-se, I was just hoping to tweak it a bit more.
MichaelGG
+2  A: 

I did some testing, and this seems to be 2-4 times faster than the code that you have now:

static int getLen(long x) {
 int len = 1;
 while (x > 9999) {
  x /= 10000;
  len += 4;
 }
 while (x > 99) {
  x /= 100;
  len += 2;
 }
 if (x > 9) len++;
 return len;
}

Edit:
Here is a version that uses more Int32 operations, that should work better if you don't have an x64 application:

static int getLen(long x) {
 int len = 1;
 while (x > 99999999) {
  x /= 100000000;
  len += 8;
 }
 int y = (int)x;
 while (y > 999) {
  y /= 1000;
  len += 3;
 }
 while (y > 9) {
  y /= 10;
  len ++;
 }
 return len;
}
Guffa
Hmm, I plugged that version in and speed dropped considerably.
MichaelGG
Hmm, worst case should be only slightly slower... Perhaps it's because your actual data looks completely different from the random test data that I created, also I ran it as x64 which has a smaller penalty for Int64 operations. You can try the new version that I posted if you like.
Guffa
A: 

I haven't tested this, but the change of base law says:

Log10(x) = Log2(x) / Log2(10)

Log2 should be a bit faster than Log10 if it's implemented right.

+1  A: 

From Sean Anderson's Bit Twiddling Hacks:

Find integer log base 10 of an integer

unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of 
int r;          // result goes here
int t;          // temporary

static unsigned int const PowersOf10[] = 
    {1, 10, 100, 1000, 10000, 100000,
     1000000, 10000000, 100000000, 1000000000};

t = (IntegerLogBase2(v) + 1) * 1233 >> 12; // (use a lg2 method from above)
r = t - (v < PowersOf10[t]);

The integer log base 10 is computed by first using one of the techniques above for finding the log base 2. By the relationship log10(v) = log2(v) / log2(10), we need to multiply it by 1/log2(10), which is approximately 1233/4096, or 1233 followed by a right shift of 12. Adding one is needed because the IntegerLogBase2 rounds down. Finally, since the value t is only an approximation that may be off by one, the exact value is found by subtracting the result of v < PowersOf10[t].

This method takes 6 more operations than IntegerLogBase2. It may be sped up (on machines with fast memory access) by modifying the log base 2 table-lookup method above so that the entries hold what is computed for t (that is, pre-add, -mulitply, and -shift). Doing so would require a total of only 9 operations to find the log base 10, assuming 4 tables were used (one for each byte of v).

As far as computing IntegerLogBase2, there are several alternatives presented on that page. It's a great reference for all sorts of highly optimized integer operations.

A variant of your version is also there, except it assumes the values (rather than the log base 10 of the values) are uniformly distributed, and therefore does an exponentially ordered search:

Find integer log base 10 of an integer the obvious way

unsigned int v; // non-zero 32-bit integer value to compute the log base 10 of 
int r;          // result goes here

r = (v >= 1000000000) ? 9 : (v >= 100000000) ? 8 : (v >= 10000000) ? 7 : 
    (v >= 1000000) ? 6 : (v >= 100000) ? 5 : (v >= 10000) ? 4 : 
    (v >= 1000) ? 3 : (v >= 100) ? 2 : (v >= 10) ? 1 : 0;

This method works well when the input is uniformly distributed over 32-bit values because 76% of the inputs are caught by the first compare, 21% are caught by the second compare, 2% are caught by the third, and so on (chopping the remaining down by 90% with each comparision). As a result, less than 2.6 operations are needed on average.

Trevor Robinson
A: 
static int getDigitCount( int x )
    {
    int digits = ( x < 0 ) ? 2 : 1; // '0' has one digit,negative needs space for sign
    while( x > 9 ) // after '9' need more
        {
        x /= 10; // divide and conquer
        digits++;
        }
    return digits;
    }
K1mp3
A: 

create a function using machine code, that will be the fastest.

weekend00