views:

156

answers:

6

I want to make a short URL service for 2 million assets but I want to use the shortest number of possible characters.

What is the math equation that I would need to use to figure it out? I know it has something to do with factorials, right?

A: 

http://mathforum.org/dr.math/faq/faq.comb.perm.html

Azeem.Butt
it's neither permutations nor combinations, since the characters can be repeated several times.
Lucas
+1  A: 

Number of possible short URLs = (Number of possible different characters in ID) raised to the power of (Length of ID in url)

For instance, if you're only using lowercase characters (of which there are 26) and your URLs look like http://domain.com/XXXXX (for your unique id's of 5 characters), then you can make 26^5 = 11,881,376 short urls.

If you were using upper and lower case letters, you'd have 52, so 52^5 = 380,204,032 possible short URLs, et cetera.

Amber
+1  A: 

You need to answer a number of questions, like what kinds of characters you want to allow in your set.

All letters and all digits? base 36 (5 characters can fit 2mil+)

Distinguish between upper and lowercase? That gets you to base 62 (4 characters)

Remove easily-mistaken characters and numbers (e.g. i/l 0/o)? roughly base 32 (also 5 characters)

Rex M
+11  A: 

It's not a factorial problem, but an exponential one.

If x is the number of possible characters, you need to solve the following equation for y:

x^y = 2000000

If you want to use all numbers and case-sensitive alpha [0-9A-Za-z], you have 62 possible values. This means you need to solve:

     62^y = 2000000
y*log(62) = log(2000000)
        y = log(2000000) / log(62)
        y = 3.5154313828...

Of course, you can't have 3.5 characters in your URL, so you would need 4. If you want to change the character set you are using for your URL's, simply resolve the problem above using the number of values in your set.

Note Solving this equation assumes fixed-length URL's. For variable-length URL's, see Rob's answer.

jheddings
Thank you so much. Very cool.
mager
A: 

According to the HTTP/URI Spec you can additionally use the following "unreserved characters": ALPHA / DIGIT / "-" / "." / "_" / "~"

That adds an additional 4 characters to your radix and thus

Math.log(2000000) / Math.log(66) = 3.4629721616408813

Although this still means you will end up with a 4 character URL path at maximum.

mhaller
+4  A: 

@jheddings is close, and got the right answer, but the math was not quite correct. Don't forget you are not limited to all the permutations of characters of a specific length. You can also leverage URLs of length 1 through y characters. Therefore we want the closed value of this sum:

x + x^2 + x^3 + ... + x^y = 2000000

Fortunately, there is a closed form for that sum:

x + x^2 + x^3 + ... + x^y = x*(x^y - 1)/(x-1) = 2000000

x is the number of possible characters in our range. For simplicity sake, let's assume it only includes lowercase, uppercase, and numbers (26+26+10 = 62.)

Then we get the following equation:

2000000 = (62^(y+1) - 62)/(62-1)
2000000 = (62^(y+1) - 62)/(61)
2000000 * 61 = 62^(y+1) - 62
122000000 = 62^(y+1) - 62
122000000 + 62 = 62^(y+1)
122000062 = 62^(y+1)
log(122000062) = (y+1)
log(122000062) / log(62) = y+1
4.511492 = y+1
3.511492 = y

And, as you said, 3.5 characters is impossible so 4 are required. Admittedly the difference doesn't matter in this case. However, in certain scenarios (especially when dealing with base 2) it is very important.

Rob Rolnick
Good point on the variable-length URL's. I would contend, however, that for fixed-length URL's, my math was correct ;) +1
jheddings