ansaurus

Question

How to count each digit in a range of integers?

Answer 1

+1 A:

You can separate each digit (look here for a example), create a histogram with entries from 0..9 (which will count how many digits appeared in a number) and multiply by the number of 'numbers' asked.

But if isn't what you are looking for, can you give a better example?

Edited:

Now I think I got the problem. I think you can reckon this (pseudo C):

int histogram[10];
memset(histogram, 0, sizeof(histogram));

for(i = startNumber; i <= endNumber; ++i)
{
    array = separateDigits(i);
    for(j = 0; k < array.length; ++j)
    {
        histogram[k]++;
    }
}

Separate digits implements the function in the link.

Each position of the histogram will have the amount of each digit. For example

histogram[0] == total of zeros
histogram[1] == total of ones

...

Regards

Andres 2010-01-13 19:47:05

Example: 1 to 100 will require 10 of each digit in the ones place, 1 of each digit in the tens place, and an additional 1 for 100.

Robert Harvey 2010-01-13 19:52:42

If you consider 100, you'll have 11 zeros, 9 like you said otherwise.

Andres 2010-01-13 19:54:09

Your solution is correct, but is almost the same as mine, only avoiding the conversion to string. You still have to loop through the entire range of integers to create the histogram, that's what I want to avoid.

Carlos Gutiérrez 2010-01-13 19:54:23

You've provided the "obvious solution" outlined in the original question.

Langdon 2010-01-13 19:55:30

Ok, that was the first that come to my mind

Andres 2010-01-13 19:56:09

Answer 2

+8 A:

I'm assuming you want a solution where the numbers are in a range, and you have the starting and ending number. Imagine starting with the start number and counting up until you reach the end number - it would work, but it would be slow. I think the trick to a fast algorithm is to realize that in order to go up one digit in the 10^x place and keep everything else the same, you need to use all of the digits before it 10^x times plus all digits 0-9 10^(x-1) times. (Except that your counting may have involved a carry past the x-th digit - I correct for this below.)

Here's an example. Say you're counting from 523 to 1004.

First, you count from 523 to 524. This uses the digits 5, 2, and 4 once each.
Second, count from 524 to 604. The rightmost digit does 6 cycles through all of the digits, so you need 6 copies of each digit. The second digit goes through digits 2 through 0, 10 times each. The third digit is 6 5 times and 5 100-24 times.
Third, count from 604 to 1004. The rightmost digit does 40 cycles, so add 40 copies of each digit. The second from right digit doers 4 cycles, so add 4 copies of each digit. The leftmost digit does 100 each of 7, 8, and 9, plus 5 of 0 and 100 - 5 of 6. The last digit is 1 5 times.

To speed up the last bit, look at the part about the rightmost two places. It uses each digit 10 + 1 times. In general, 1 + 10 + ... + 10^n = (10^(n+1) - 1)/9, which we can use to speed up counting even more.

My algorithm is to count up from the start number to the end number (using base-10 counting), but use the fact above to do it quickly. You iterate through the digits of the starting number from least to most significant, and at each place you count up so that that digit is the same as the one in the ending number. At each point, n is the number of up-counts you need to do before you get to a carry, and m the number you need to do afterwards.

Now let's assume pseudocode counts as a language. Here, then, is what I would do:

convert start and end numbers to digit arrays start[] and end[]
create an array counts[] with 10 elements which stores the number of copies of
     each digit that you need

iterate through start number from right to left. at the i-th digit,
    let d be the number of digits you must count up to get from this digit
        to the i-th digit in the ending number. (i.e. subtract the equivalent
        digits mod 10)
    add d * (10^i - 1)/9 to each entry in count.
    let m be the numerical value of all the digits to the right of this digit,
        n be 10^i - m.
    for each digit e from the left of the starting number up to and including the
        i-th digit, add n to the count for that digit.
    for j in 1 to d
        increment the i-th digit by one, including doing any carries
        for each digit e from the left of the starting number up to and including
            the i-th digit, add 10^i to the count for that digit
    for each digit e from the left of the starting number up to and including the
        i-th digit, add m to the count for that digit.
    set the i-th digit of the starting number to be the i-th digit of the ending
        number.

Oh, and since the value of i increases by one each time, keep track of your old 10^i and just multiply it by 10 to get the new one, instead of exponentiating each time.

Noah Lavine 2010-01-13 20:01:37

# Second, count from 524 to 604. The rightmost digit does 6 cycles through all of the digits, so you need 6 copies of each digit. . . Is there a typo here, im reading it as though your bringing the (5)2 up to (6)0, so should that be 8 cycles?

strainer 2010-01-14 17:53:16

Yeah, that's a bug.

Noah Lavine 2010-01-15 19:50:50

Answer 3

+3 A:

Your approach is fine. I'm not sure why you would ever need anything faster than what you've described.

Or, this would give you an instantaneous solution: Before you actually need it, calculate what you would need from 1 to some maximum number. You can store the numbers needed at each step. If you have a range like your second example, it would be what's needed for 1 to 300, minus what's needed for 1 to 50.

Now you have a lookup table that can be called at will. Doing up to 10,000 would only take a few MB and, what, a few minutes to compute, once?

John at CashCommons 2010-01-13 20:01:49

Answer 4

A:

This doesn't answer your exact question, but it's interesting to note the distribution of first digits according to Benford's Law. For example, if you choose a set of numbers at random, 30% of them will start with "1", which is somewhat counter-intuitive.

I don't know of any distributions describing subsequent digits, but you might be able to determine this empirically and come up with a simple formula for computing an approximate number of digits required for any range of numbers.

Alex Reisner 2010-01-13 20:15:19

Thanks, very interesting stuff. I found this googling: http://blogs.msdn.com/ericlippert/archive/2005/01/12/benford-s-law.aspx

Carlos Gutiérrez 2010-01-14 03:00:31

Answer 5

+1 A:

If "better" means "clearer," then I doubt it. If it means "faster," then yes, but I wouldn't use a faster algorithm in place of a clearer one without a compelling need.

#!/usr/bin/ruby1.8

def digits_for_range(min, max, leading_zeros)
  bins = [0] * 10
  format = [
    '%',
    ('0' if leading_zeros),
    max.to_s.size,
    'd',
  ].compact.join
  (min..max).each do |i|
    s = format % i
    for digit in s.scan(/./)
      bins[digit.to_i] +=1  unless digit == ' '
    end
  end
  bins
end

p digits_for_range(1, 49, false) 
# => [4, 15, 15, 15, 15, 5, 5, 5, 5, 5]

p digits_for_range(1, 49, true)
# => [13, 15, 15, 15, 15, 5, 5, 5, 5, 5]

p digits_for_range(1, 10000, false)
# => [2893, 4001, 4000, 4000, 4000, 4000, 4000, 4000, 4000, 4000]

Ruby 1.8, a language known to be "dog slow," runs the above code in 0.135 seconds. That includes loading the interpreter. Don't give up an obvious algorithm unless you need more speed.

Wayne Conrad 2010-01-13 21:39:13

Answer 6

+5 A:

I asked this question on Math Overflow, and got spanked for asking such a simple question. One of the users took pity on me and said if I posted it to The Art of Problem Solving, he would answer it; so I did.

Here is the answer he posted:
http://www.artofproblemsolving.com/Forum/viewtopic.php?p=1741600#1741600

Embarrassingly, my math-fu is inadequate to understand what he posted (the guy is 19 years old...that is so depressing). I really need to take some math classes.

On the bright side, the equation is recursive, so it should be a simple matter to turn it into a recursive function with a few lines of code, by someone who understands the math.

Robert Harvey 2010-01-14 00:34:12

I think that the answer you point us to is (approximately) the mathematical statement of the pseudo-code given by @noahlavine. Note that the mathematical version puts leading 0s on numbers.

High Performance Mark 2010-01-14 01:24:32

Robert: Thanks for taking the time, I'm going to try to read that. (Yes, it is depressing) (At least you got an Editor badge at math overflow :)

Carlos Gutiérrez 2010-01-14 02:51:41

I happened to take a gander at this one while I was revising mine, and it turns out his solution is not quite correct. It's almost there, but his `m + 1` should only be accounted for when digit `i` is the MSD, and he doesn't take into account padding zeros for the remainder (explained in my answer).

Aaronaught 2010-01-14 18:16:02

Answer 7

+2 A:

To reel of the digits from a number, we'd only ever need to do a costly string conversion if we couldnt do a mod, digits can most quickly be pushed of a number like this:

feed=number;
do
{ digit=feed%10;
  feed/=10; 
  //use digit... eg. digitTally[digit]++;
  }
while(feed>0)

that loop should be very fast and can just be placed inside a loop of the start to end numbers for the simplest way to tally the digits.

To go faster, for larger range of numbers, im looking for an optimised method of tallying all digits from 0 to number*10^significance (from a start to end bazzogles me)

here is a table showing digit tallies of some single significant digits.. these are inclusive of 0, but not the top value itself, -that was an oversight but its maybe a bit easier to see patterns (having the top values digits absent here) These tallies dont include trailing zeros,

  1 10 100 1000 10000 2 20 30 40 60 90 200 600 2000  6000

0 1 1  10  190  2890  1  2  3  4  6  9  30 110  490  1690
1 0 1  20  300  4000  1 12 13 14 16 19 140 220 1600  2800
2 0 1  20  300  4000  0  2 13 14 16 19  40 220  600  2800
3 0 1  20  300  4000  0  2  3 14 16 19  40 220  600  2800
4 0 1  20  300  4000  0  2  3  4 16 19  40 220  600  2800
5 0 1  20  300  4000  0  2  3  4 16 19  40 220  600  2800
6 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
7 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
8 0 1  20  300  4000  0  2  3  4  6 19  40 120  600  1800
9 0 1  20  300  4000  0  2  3  4  6  9  40 120  600  1800

edit: clearing up my origonal thoughts:

from the brute force table showing tallies from 0 (included) to poweroTen(notinc) it is visible that a majordigit of tenpower:

increments tally[0 to 9] by md*tp*10^(tp-1)
increments tally[1 to md-1] by 10^tp
decrements tally[0] by (10^tp - 10) 
(to remove leading 0s if tp>leadingzeros)
can increment tally[moresignificantdigits] by self(md*10^tp) 
(to complete an effect)

if these tally adjustments were applied for each significant digit, the tally should be modified as though counted from 0 to end-1

the adjustments can be inverted to remove preceeding range (start number)

Thanks Aaronaught for your complete and tested answer.

strainer 2010-01-14 02:35:05

Answer 8

+5 A:

Hi

Here's a very bad answer, I'm ashamed to post it. I asked Mathematica to tally the digits used in all numbers from 1 to 1,000,000, no leading 0s. Here's what I got:

Next time you're ordering sticky digits for selling in your hardware store, order in these proportions, you won't be far wrong.

Regards

Mark

High Performance Mark 2010-01-14 03:00:51

Holy cow, someone upvoted this !

High Performance Mark 2010-01-14 03:14:49

I upvoted it too ... that's instructive.

John at CashCommons 2010-01-14 03:18:27

Yes, it is instructive. It suggests that there might be a simpler answer than I originally thought (involving proportions).

Robert Harvey 2010-01-14 04:06:37

different pattern - digits used in all numbers, in all numbers from 1 to 1000; (0 94905) (1 188700) (2 177600) (3 166500) (4 155400) (5 144300) (6 133200) (7 122100) (8 111000) (9 99900)

strainer 2010-01-15 01:40:56

Answer 9

+5 A:

There's a clear mathematical solution to a problem like this. Let's assume the value is zero-padded to the maximum number of digits (it's not, but we'll compensate for that later), and reason through it:

From 0-9, each digit occurs once
From 0-99, each digit occurs 20 times (10x in position 1 and 10x in position 2)
From 0-999, each digit occurs 300 times (100x in P1, 100x in P2, 100x in P3)

The obvious pattern for any given digit, if the range is from 0 to a power of 10, is N * 10^N-1, where N is the power of 10.

What if the range is not a power of 10? Start with the lowest power of 10, then work up. The easiest case to deal with is a maximum like 399. We know that for each multiple of 100, each digit occurs at least 20 times, but we have to compensate for the number of times it appears in the most-significant-digit position, which is going to be exactly 100 for digits 0-3, and exactly zero for all other digits. Specifically, the extra amount to add is 10^N for the relevant digits.

Putting this into a formula, for upper bounds that are 1 less than some multiple of a power of 10 (i.e. 399, 6999, etc.) it becomes: M * N * 10^N-1 + iif(d <= M, 10^N, 0)

Now you just have to deal with the remainder (which we'll call R). Take 445 as an example. This is whatever the result is for 399, plus the range 400-445. In this range, the MSD occurs R more times, and all digits (including the MSD) also occur at the same frequencies they would from range [0 - R].

Now we just have to compensate for the leading zeros. This pattern is easy - it's just:

10^N + 10^N-1 + 10^N-2 + ... + **10⁰

Update: This version correctly takes into account "padding zeros", i.e. the zeros in middle positions when dealing with the remainder ([4*0*0, 4*0*1, 4*0*2, ...]). Figuring out the padding zeros is a bit ugly, but the revised code (C-style pseudocode) handles it:

function countdigits(int d, int low, int high) {
    return countdigits(d, low, high, false);
}

function countdigits(int d, int low, int high, bool inner) {
    if (high == 0)
        return (d == 0) ? 1 : 0;

    if (low > 0)
        return countdigits(d, 0, high) - countdigits(d, 0, low);

    int n = floor(log10(high));
    int m = floor((high + 1) / pow(10, n));
    int r = high - m * pow(10, n);
    return
        (max(m, 1) * n * pow(10, n-1)) +                             // (1)
        ((d < m) ? pow(10, n) : 0) +                                 // (2)
        (((r >= 0) && (n > 0)) ? countdigits(d, 0, r, true) : 0) +   // (3)
        (((r >= 0) && (d == m)) ? (r + 1) : 0) +                     // (4)
        (((r >= 0) && (d == 0)) ? countpaddingzeros(n, r) : 0) -     // (5)
        (((d == 0) && !inner) ? countleadingzeros(n) : 0);           // (6)
}

function countleadingzeros(int n) {
    return (n == 0) ? 1 : pow(10, n) * countleadingzeros(n - 1);
}

function countpaddingzeros(int n, int r) {
    return (r + 1) * max(0, n - max(0, floor(log10(r))) - 1);
}

As you can see, it's gotten a bit uglier but it still runs in O(log n) time, so if you need to handle numbers in the billions, this will still give you instant results. :-) And if you run it on the range [0 - 1000000], you get the exact same distribution as the one posted by High-Performance Mark, so I'm almost positive that it's correct.

FYI, the reason for the inner variable is that the leading-zero function is already recursive, so it can only be counted in the first execution of countdigits.

Update 2: In case the code is hard to read, here's a reference for what each line of the countdigits return statement means (I tried inline comments but they made the code even harder to read):

Frequency of any digit up to highest power of 10 (0-99, etc.)
Frequency of MSD above any multiple of highest power of 10 (100-399)
Frequency of any digits in remainder (400-445, R = 45)
Additional frequency of MSD in remainder
Count zeros in middle position for remainder range (404, 405...)
Subtract leading zeros only once (on outermost loop)

Aaronaught 2010-01-14 03:52:52

Code appears to contain infinite recursion. The r variable settles on 1 instead of 0.

Robert Harvey 2010-01-14 17:06:04

It's been corrected now (and actually tested).

Aaronaught 2010-01-14 18:18:11

Answer 10

A:

phord 2010-01-15 18:01:59

ansaurus

tags:

views:

answers:

How to count each digit in a range of integers?

Edit:

related questions