Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses:
- 1 to 100
- 51 to 300
- 1 to 2,000 with zeros to the left
The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers.
I wonder if there is a better way to solve this, without having to loop through the entire integers range.
Solutions in any language or pseudocode are welcome.
Edit:
Answers review
John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building.
Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem.
Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings.
John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those.
High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions.
strainer found some formulas that may be used to count digit for number which are a power of ten.
Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation.
Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :).
noahlavine developed an algorithm and presented it in pseudocode.
A new solution
After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1:
- For digits 1 to 9, n*10(n-1) pieces are needed
- For digit 0, if not using leading zeros, n*10n-1 - ((10n-1) / 9) are needed
- For digit 0, if using leading zeros, n*10n-1 - n are needed
The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers).
For example, if n = 6, range is 1 to 999,999:
- For digits 1 to 9 we need 6*105 = 600,000 of each one
- For digit 0, without leading zeros, we need 6*105 – (106-1)/9 = 600,000 - 111,111 = 488,889
- For digit 0, with leading zeros, we need 6*105 – 6 = 599,994
These numbers can be checked using High-Performance Mark results.
Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode:
integer First,Last //First and last number in the range integer Number //Current number in the loop integer Power //Power is the n in 10^n in the formulas integer Nines //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999 integer Prefix //First digits in a number. For 14,200, prefix is 142 array 0..9 Digits //Will hold the count for all the digits FOR Number = First TO Last CALL TallyDigitsForOneNumber WITH Number,1 //Tally the count of each digit //in the number, increment by 1 //Start of optimization. Comments are for Number = 1,000 and Last = 8,000. Power = Zeros at the end of number //For 1,000, Power = 3 IF Power > 0 //The number ends in 0 00 000 etc Nines = 10^Power-1 //Nines = 10^3 - 1 = 1000 - 1 = 999 IF Number+Nines <= Last //If 1,000+999 < 8,000, add a full set Digits[0-9] += Power*10^(Power-1) //Add 3*10^(3-1) = 300 to digits 0 to 9 Digits[0] -= -Power //Adjust digit 0 (leading zeros formula) Prefix = First digits of Number //For 1000, prefix is 1 CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each //digit in prefix, //increment by 999 Number += Nines //Increment the loop counter 999 cycles ENDIF ENDIF //End of optimization ENDFOR SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count REPEAT Digits [ Number % 10 ] += Count Number = Number / 10 UNTIL Number = 0
For example, for range 786 to 3,021, the counter will be incremented:
- By 1 from 786 to 790 (5 cycles)
- By 9 from 790 to 799 (1 cycle)
- By 1 from 799 to 800
- By 99 from 800 to 899
- By 1 from 899 to 900
- By 99 from 900 to 999
- By 1 from 999 to 1000
- By 999 from 1000 to 1999
- By 1 from 1999 to 2000
- By 999 from 2000 to 2999
- By 1 from 2999 to 3000
- By 1 from 3000 to 3010 (10 cycles)
- By 9 from 3010 to 3019 (1 cycle)
- By 1 from 3019 to 3021 (2 cycles)
Total: 28 cycles Without optimization: 2,235 cycles
Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack:
If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1.
Benchmark and Source code
I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results:
Original 104.78 seconds With %10 83.66 With Powers of Ten 0.07
A screenshot of the benchmark application:
If you would like to see the full source code or run the benchmark, use these links:
- Complete Source code (in Clarion): http://sca.mx/ftp/countdigits.txt
- Compilable project and win32 exe: http://sca.mx/ftp/countdigits.zip
Accepted answer
noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained.
Aaronaught solution seems to be correct, but the code is just too complex for my taste.
I accepted strainer’s answer, because his line of thought guided me to develop this new solution.