tags:

views:

195

answers:

6

How can google short URL cater for so many URLs in web with just four character, even though alphabets are case sensitive?

http://goo.gl/SUWp

Say fn(some url)-> four letter for url, how can they suddenly use the same function which gives five letter for url after sometime?How will they know whether it is in four letter or five letter url from url?

+5  A: 

26 letters * 2 (upper/lowercase) = 52 ^ 4 (to the power of 4) = 7311616 urls

if they add digits, it would be 62^4 = 14776336 urls.

so they have some time to go before adding a 5th letter/digit

Toad
If they ever reach that max, they can just add one more space making it 380,204,032
Doug
+1  A: 

It works the same way as all the other shorteners- the characters are a unique ID of the URL that has been shortened. With 52 letters (upper and lower case) plus numbers and special characters, there are plenty of combinations to work with.

Dave Swersky
I don't think it's really a hash, but rather incremental and stored in a table. You might argue that _is_ a hash in one sense, but they're not really the same thing.
Joel Coehoorn
It's probably not a hash since 4 digit/letters is too short for it to be a hash (it would collide too often).
Toad
@Joel/reiner: I wondered about the shortness... you're right there's no way that could be a "traditional" hash.
Dave Swersky
A: 

This is how:

Google URL Shortener is currently available for Google products and not for broader consumer use.

I don't believe even Google has 7 million pages worth shortening.

Edit:

Apparently you can shorten URLs using the Google Toolbar:

Google URL shortener is not a stand-alone service; you can't use it to shorten links directly. Currently, Google URL Shortener is only available from the Google Toolbar and FeedBurner. If the service proves useful, we may eventually make it available for a wider audience in the future.

Still, that is not "broad" consumer use. If they run out, they'll add more letters.

Response to updated question:

Say fn(some url)-> four letter for url, how can they suddenly use the same function which gives five letter for url after sometime?

Google is not simply hashing the URL then just using it (remember, hashes are only 1 way so you couldn't get the original URL out of them anyway - it must be stored in a database). They may start with a hash, then perform a lookup in a database to see if that key already exists. If it does not, it will be used as the key. If it already exists, they'll use some other method to perform the hash, or manipulate the hash in a way that makes it unique.

How will they know whether it is in four letter or five letter url from url?

If the end of the URL has 4 letters, then that's how they know...

John Rasch
Not true -- the URL in the question points to this question.
alvherre
+1 for answering my second question, but that is my primary question
yesraaj
I mean from long url
yesraaj
I wonder if that's really how it works. Take the number of potential URLs JCasso came up with: 52 * 52 * 52 * 52 = 7.311.616. That doesn't seem like a lot. As the table of existing hashes begins to fill up, it becomes more likely that you'll generate a duplicate over and over again.
Jeff
@yesraaj - only a Google developer could tell you exactly (it may not even be a traditional "hash" that they use. However, once a short URL is created, there's a record stored on the database that maps the two so that you can interchange them easily.
John Rasch
@Jeff - an easy fix for that would be to think of the hash as an integer and increment it until you find a unique. Google is pretty smart, they could calculate how long before the birthday paradox starts hurting the function and then start using 5 letters.
John Rasch
A: 

(26 + 26 + 10) * (26 + 26 + 10) * (26 + 26 + 10) * (26 + 26 + 10) = 14776336

That's 26 lower case, 26 upper case, and 10 digits for 62 possible characters. And actually I think it's likely a base-64 encoded representation of some other value, so the number is probably more like 16777216.

Joel Coehoorn
What's the +10 for?
Doug
doug: 10 digits
Toad
Thanks :) From the first post, I thought they only used alphabets and Joel didn't mention digits yet.
Doug
+1  A: 

There are 26 letters in English Alphabet. Lower + Upper it is 52.

52 * 52 * 52 * 52 = 7.311.616 They are limited with this number. If they run out of 4 letter urls, they can upgrade to 5 without any problem, Cannot they?

I don't think adding digits is a good idea for that since 0 (zero) and O, 1 (one) and l (L) are very similar.

JCasso
A: 

I don't know how Google does it. But I imagine one way you could implement a short URL is by incrementing the value using the characters 0-9a-zA-Z -- essentially using a base 62 number system. So, the method that generates the value might look up the most recently used value, then increment it by one. For example: abcz + 1 = abcA. Or: ZZZZ + 1 = 00000.

Jeff