ansaurus

Question

Efficient way to determine number of digits in an integer

Answer 1

+9 A:

int digits = 0; while (number != 0) { number /= 10; digits++; }

Note: "0" will have 0 digits! Just do if (number == 0) { digits = 1; } if you need 0 to appear to have 1 digit.

In the end, use a profiler to know which of all the answers here will be faster on your machine...

squelart 2009-09-28 23:31:42

This may or may not be faster than the unrolled loop approach I took - you'd need to profile the difference (should be negligible in the long run).

Vitali 2009-09-28 23:43:50

Agreed, profiling is the only way to really know for sure! I updated my answer with that comment, as Ben S's ceil(log10()) answer has disappeared.

squelart 2009-09-28 23:49:21

Answer 2

A:

previous works for number >= 1. Original poster probably wants to handle zero. Whether or not negative numbers should include the '-' printed out in usual representation.

It's also possible the original poster wants, effectively the magnitude of the largest representable integer in an implementation, which may be different.

2009-09-28 23:32:39

Matt: this is not an answer. Perhaps you don't have enough rep to comment, but you shouldn't post comments as answers. I'm not down-voting because you're new, but you should read the FAQ linked at the top of the page if you haven't already.

Argalatyr 2009-09-28 23:35:50

Welcome to Stack Overflow! Note that answers to questions can be sorted by time or votes; the default is votes and so answers will be reordered (randomly if the vote counts are equal). Using "previous" to refer to a specific answer doesn't make sense on SO.

Greg Hewgill 2009-09-28 23:35:52

Answer 3

+18 A:

Well, the most efficient way, presuming you know the size of the integer, would be a lookup. Should be faster than the much shorter logarithm based approach. If you don't care about counting the '-', remove the + 1.

// generic solution
template <class T>
int numDigits(T number)
{
    int digits = 0;
    if (number < 0) digits = 1; // remove this line if '-' counts as a digit
    while (number) {
        number /= 10;
        digits++;
    }
    return digits;
}

// partial specialization optimization for 32-bit numbers
template<>
int numDigits(int32_t x)
{
    if (x == MIN_INT) return 10 + 1;
    if (x < 0) return numDigits(-x) + 1;

    if (x >= 10000) {
        if (x >= 10000000) {
            if (x >= 100000000) {
                if (x >= 1000000000)
                    return 10;
                return 9;
            }
            return 8;
        }
        if (x >= 100000) {
            if (x >= 1000000)
                return 7;
            return 6;
        }
        return 5;
    }
    if (x >= 100) {
        if (x >= 1000)
            return 4;
        return 3;
    }
    if (x >= 10)
        return 2;
    return 1;
}

// partial-specialization optimization for 8-bit numbers
template <>
int numDigits(char n)
{
    // if you have the time, replace this with a static initialization to avoid
    // the initial overhead & unnecessary branch
    static char x[256] = {0};
    if (x[0] == 0) {
        for (char c = 1; c != 0; c++)
            x[c] = numDigits((int32_t)c);
        x[0] = 1;
    }
    return x[n];
}

Vitali 2009-09-28 23:35:57

Probably faster than my answer, well done. For added efficiency, if you know that your input numbers will be mostly small ones (I'm guessing less than 100,000), then reverse the tests: if (x < 10) return 1; if (x < 100) return 2; etc., so that the function will do less tests and exit faster.

squelart 2009-09-28 23:44:02

Or perhaps reorder and nest the if statements, to do a binary search instead of a linear search.

Dave Hinton 2009-09-28 23:44:45

Yeah - that's even more clever.

Vitali 2009-09-29 00:51:24

That's not a good idea. What happens when the architecture expands to 256 bit integers. You need to remember to come back and modify this code. In real life that will not happen and sice this is probably going to be used to build a buffer of the correct size you are now opening yourself to all sorts of buffer over run problems on larger architectres.

Martin York 2009-09-29 00:51:39

Hence the original disclaimer that it's for 32-bit. I've added the template solution that does it generically too.

Vitali 2009-09-29 00:57:46

Vitali 2009-09-29 01:03:01

In the generic solution you should add `if (number == 0) { return 1; }` to be consistent with the specializations.

squelart 2009-09-29 01:53:40

assuming a uniform distribution of numbers, the reverse linear search ( starting from max digits to 1 ) may be faster on average than the binary search as there are quite more numbers with N digits than with N-1 digitshttp://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10Obvious

fa. 2009-09-29 12:27:01

I wouldn't worry very hard about 256 or 128 bit integers. Unless you need to count the number of electrons in the Universe (10^78 last time I did it), 64 bits will do pretty well. 32 bit machines lasted ~~15 years. I'd guess 64 bit machines will last a lot longer. For larger number, multiprecision arithmetic will be fine, and I doubt if efficiency of computing digit count will matter.

Ira Baxter 2009-09-30 09:26:30

gcc has __int128_t, useful to hold the result of multiplying two 64 bit integers (amd64 has 64bit x 64bit = 128 bit in hardware). You don't have to count real things to have big numbers; crypto routinely uses large integers.

janm 2009-10-03 06:37:36

That algorithm assumes the values are equally distributed based on the number of digits, which is almost never the case. See fa's comment. BTW, those aren't partial specializations; they're complete explicit specializations. :-)

Trevor Robinson 2009-12-12 18:07:42

Answer 4

+7 A:

The simplest way is to do:

void GetNumberOfDigits (unsigned i)
{
    return i > 0 ? (int) log10 ((double) i) + 1 : 1;
}

log10 is defined in <cmath> or <math.h>. You'd need to profile this to see if it's faster than any of the others posted here. I'm not sure how robust this is with regards to float point precision. Also, the argument is unsigned as negative values and log don't really mix.

Skizz

Skizz 2009-09-29 00:00:40

For 32 bit ints and 56 bit floats, this probably works.If the input is a long (64 bits), the 56 bits of double-precision log may cause this to produce the wrong answer in cases of values near large values of 10^n. Expect trouble above 2^50.

Ira Baxter 2009-09-29 10:07:09

There's also the question of how accurate the log functions are. I haven't checked how accurate they are in modern libraries, and wouldn't be comfortable blindly trusting them to be good to one part in a billion.

David Thornley 2009-09-29 13:49:21

Answer 5

+2 A:

A previous poster suggested a loop that divides by 10. Since multiplies on modern machines are a lot faster, I'd recommend the following code instead:

 int digits = 1, pten=10; while ( pten <= number ) { digits++; pten*=10; }

Ira Baxter 2009-09-29 01:05:25

the devil is in the details - what happens with say std::numeric_limits<int>::max == number - it might have a problem terminating

pgast 2009-09-29 02:53:30

If you are worried about that case, you can add one additional IF to handle very large values.

Ira Baxter 2009-09-29 03:46:48

I should observe that on x86 machines, a multiply by a constant 10 as used in this case can actually be implemented by the compiler as LEA R2,[8*R1+R1], ADD R1,R2 so it takes 2 clocks max. Multiplys by variables take tens of clocks, and divides are much worse.

Ira Baxter 2009-09-29 10:19:39

Answer 6

+2 A:

Practical joke: This is the most efficient way (number of digits is calculated in compilte time):

template <unsigned long long N, size_t base=10>
struct numberlength
{
 enum { value = 1 + numberlength<N/base, base>::value };
};

template <size_t base>
struct numberlength<0, base>
{
 enum { value = 0 };
};

May be useful to determine the width required for number field in formatting, input elements etc.

blinnov.com 2009-09-29 01:32:53

First, your solution doesn't work for 0.Secondly your solution is inapplicable to the general case of a variable.Thirdly, if you are using a constant literal, you already know how many digits it has.

Vitali 2009-09-29 08:02:35

It works for 0 too.It also works for any base.The rest are valid points that I already outlined.

blinnov.com 2009-09-29 13:10:25

I don't think it does actually. It fails on `0` and also fails on base `1` :) and gives divide by zero errors if the base is given as `0`. It can be fixed though. Anyway I'm nitpicking over a very old post, so sorry, it's just that I think this needn't be a joke and could actually be useful.

tjm 2010-10-07 17:46:02

Answer 7

A:

Here's a different approach:

digits = sprintf(numArr, "%d", num);    // where numArr is a char array
if (num < 0)
    digits--;

This may not be efficient, just something different than what others suggested.

Ashwin 2009-09-29 05:32:14

The request was for extremely efficient. This is the opposite.

Ira Baxter 2009-09-30 00:49:26

Answer 8

+4 A:

See Bit Twiddling Hacks for a much shorter version of the answer you accepted. It also has the benefit of finding the answer sooner if your input is normally distributed, by checking the big constants first. (v >= 1000000000) catches 76% of the values, so checking that first will on average be faster.

Josh Haberman 2009-09-29 05:58:36

It's unclear if the bit-twiddling is actually faster. Even in the worst case, my modified approach requires 4 comparisons (might be able to take it down to 3 if I examined the partitioning further, although that looks unlikely). I seriously doubt that that is going to be beat by arithmetic operations + memory loads (although if accessed enough, those disappear into the CPU cache).Remember, in the example they give, they also hide the log base 2 as some abstract IntegerLogBase2 function (which itself is actually not cheap).

Vitali 2009-09-29 08:14:33

Just as a follow up, yes if the numbers are normally distributed, doing the in-order check is faster. However, it has the degenerate case of being twice as slow in the worst case. The partitioned approach by number of digits instead of input space means that the behaviour does not have a degenerate case and always performs optimally. Furthermore, remember you are making the assumption that the numbers will be uniformly distributed. In fact, they are more likely to follow some distribution related to <a href="http://en.wikipedia.org/wiki/Benford%27s_law">Benford's</a> would be my guess.

Vitali 2009-09-29 08:19:34

The bit twiddling hacks are not faster than the partition method above, but they are potentially interesting if you had a more general case like a float here.

Corwin Joy 2009-09-30 08:25:38

The bit twiddling hacks suggests a way to get the int log10, given the int log2. It suggests several way to get int log2, mostly involving few compare/branches. (I think you're underestimating the cost of unpredictable branches, Vitali). If you can use inline x86 asm, the BSR instruction will give you the int log2 of a value (i.e. the bit index of a the most significant set bit). It's a bit slow on K8 (10 cycle latency), but fast on Core 2 (2 or 3 cycle latency). Even on K8, may well be faster than the comparisons.

Peter Cordes 2009-12-06 00:08:39

On K10, lzcnt counts leading zeros, so it's almost the same as bsr, but an input of 0 is no longer a special case with undefined results. Latencies: BSR: 4, LZCNT: 2.

Peter Cordes 2009-12-06 00:30:54

Answer 9

A:

I like Ira Baxter's answer. Here is a template variant that handles the various sizes and deals with the maximum integer values (updated to hoist the upper bound check out of the loop):

#include <boost/integer_traits.hpp>

template<typename T> T max_decimal()
{
    T t = 1;

    for (unsigned i = boost::integer_traits<T>::digits10; i; --i)
        t *= 10;

    return t;
}

template<typename T>
unsigned digits(T v)
{
    if (v < 0) v = -v;

    if (max_decimal<T>() <= v)
        return boost::integer_traits<T>::digits10 + 1;

    unsigned digits = 1;
    T boundary = 10;

    while (boundary <= v) {
        boundary *= 10;
        ++digits;
    }

    return digits;
}

To actually get the improved performance from hoisting the additional test out of the loop, you need to specialise max_decimal() to return constants for each type on your platform. A sufficiently magic compiler could optimise the call to max_decimal() to a constant, but specialisation is better with most compilers today. As it stands, this version is probably slower because max_decimal costs more than the tests removed from the loop.

I'll leave all that as an exercise for the reader.

janm 2009-09-29 06:03:47

You want to make the upper limit check a seperate conditional tested first so you don't check it on each loop iteration.

Ira Baxter 2009-09-29 10:24:48

You don't want to put 10 into that temp t. The compiler might consider multiplying by t to be multiplying by a real variable, and use a general purpose multiply instruction. If you instead wrote "result*=10;" the compiler will surely notice the multiply by constant 10 and implement that with a few shifts and adds, which is extremely fast.

Ira Baxter 2009-09-30 00:48:39

If the multiplication by t was always a multiplication by 10, then yes, the compiler could do strength reduction. However, t is not loop-invariant in this case (it is just a modification of an integer power function I had lying around). The correct optimisation is specialisation on type returning a constant. However, you are right that, in this case, the function is always raising 10 to a power, not an arbitrary integer to a power, and strength reduction gives a good win. So I made a change ... This time further changes really are left as an exercise! (Stack Overflow is a big time sink ...)

janm 2009-09-30 07:07:11

Answer 10

A:

template <typename type>
class number_of_decimal_digits {   
    const powers_and_max<type> mPowersAndMax;
public:
    number_of_decimal_digits(){
    }   
    inline size_t ndigits( type i) const {
        if(i<0){
             i += (i == std::numeric_limits<type>::min());
             i=-i;
        }
        const type* begin = &*mPowersAndMax.begin();
        const type* end = begin+mPowersAndMax.size();
        return 1 + std::lower_bound(begin,end,i) - begin;
    }
    inline size_t string_ndigits(const type& i) const {
        return (i<0) + ndigits(i);
    }
    inline size_t operator[](const type& i) const {
       return string_ndigits(i);
    }
};

where in powers_and_max we have (10^n)-1 for all n such that

(10^n) < std::numeric_limits<type>::max()

and std::numeric_limits<type>::max() in an array:

template <typename type>
struct powers_and_max : protected std::vector<type>{
    typedef std::vector<type> super;
    using super::const_iterator;
    using super::size;
    type& operator[](size_t i)const{return super::operator[](i)};
    const_iterator begin()const {return super::begin();} 
    const_iterator end()const {return super::end();} 
    powers_and_max() {
       const int size = (int)(log10(double(std::numeric_limits<type>::max())));
       int j = 0;
       type i = 10;
       for( ; j<size ;++j){
           push_back(i-1);//9,99,999,9999 etc;
           i*=10;
       }
       ASSERT(back()<std::numeric_limits<type>::max());
       push_back(std::numeric_limits<type>::max());
   }
};

here's a simple test:

number_of_decimal_digits<int>  ndd;
ASSERT(ndd[0]==1);
ASSERT(ndd[9]==1);
ASSERT(ndd[10]==2);
ASSERT(ndd[-10]==3);
ASSERT(ndd[-1]==2);
ASSERT(ndd[-9]==2);
ASSERT(ndd[1000000000]==10);
ASSERT(ndd[0x7fffffff]==10);
ASSERT(ndd[-1000000000]==11);
ASSERT(ndd[0x80000000]==11);

Of course any other implementation of an ordered set might be used for powers_and_max and if there was knowledge that there would be clustering but no knowledge of where the cluster might be perhaps a self adjusting tree implementation might be best

pgast 2009-09-29 06:10:07

Answer 11

+1 A:

The ppc architecture has a bit counting instruction. With that, you can determine the log base 2 of a positive integer in a single instruction. For example, 32 bit would be:

#define log_2_32_ppc(x) (31-__cntlzw(x))

If you can handle a small margin of error on large values you can convert that to log base 10 with another few instructions:

#define log_10_estimate_32_ppc(x) (9-(((__cntlzw(x)*1233)+1545)>>12))

This is platform specific and slightly inaccurate, but also involves no branches, division or conversion to floating point. All depends on what you need.

I only know the ppc instructions off hand, but other architectures should have similar instructions.

2009-09-29 06:30:28

This solution computes log2(15)= 4 bits and log2(9)=4 bits. But 15 and 9 require different numbers of decimal digits to print. So it doesn't work, unless you don't mind your numbers printing with too many digits sometimes. But in that case, you can always choose "10" as the answer for int.

Ira Baxter 2009-09-29 11:05:31

Answer 12

A:

effective way

int num;
int count = 0;
while(num)
{
   num /= 10;
   ++count;
}

#include <iostream>

int main()
{
   int num;
   std::cin >> num;

   std::cout << "number of digits for " << num << ": ";

   int count = 0;
   while(num)
   {
      num /= 10;
      ++count;
   }

   std::cout << count << '\n';

   return 0;
}

Davit Siradeghyan 2009-09-30 13:23:29

Answer 13

+2 A:

Perhaps I misunderstood the question but doesn't this do it?

int NumDigits(int x)  
{  
    x = abs(x);  
    return (x < 10 ? 1 :   
     (x < 100 ? 2 :   
     (x < 1000 ? 3 :   
     (x < 10000 ? 4 :   
     (x < 100000 ? 5 :   
     (x < 1000000 ? 6 :   
     (x < 10000000 ? 7 :  
     (x < 100000000 ? 8 :  
     (x < 1000000000 ? 9 :  
     10)))))))));  
}

Brad 2009-09-30 14:43:06

ansaurus

tags:

views:

answers:

Efficient way to determine number of digits in an integer

related questions