views:

921

answers:

5

How can I make unique URL in Python a la http://imgur.com/gM19g or http://tumblr.com/xzh3bi25y When using uuid from python I get a very large one. I want something shorter for URLs.

+2  A: 

The reason UUIDs are long is because they contain lots of information so that they can be guaranteed to be globally unique.

If you want something shorter, then you'll need to do something like generate a random string, checking whether it is in the universe of already generated strings, and repeating until you get an unused string. You'll also need to watch out for concurrency here (what if the same string gets generated by a separate process before you inserted into the set of strings?).

If you need some help generating random strings in Python, this other question might help.

Dominic Rodger
+1  A: 

It doesn't really matter that this is Python, but you just need a hash function that maps to the length you want. For example, maybe use MD5 and then take just the first n characters. You'll have to watch out for collisions in that case, though, so you might want to pick something a little more robust in terms of collision detection (like using primes to cycle through the space of hash strings).

Gordon Worley
+6  A: 

I'm not sure most URL shorteners use a random string. My impression is they write the URL to a database, then use the integer ID of the new record as the short URL, encoded base 36 or 62 (letters+digits).

Python code to convert an int to a string in arbitrary bases is here.

Ned Batchelder
+4  A: 

Edit: Here, I wrote a module for you. Use it. http://code.activestate.com/recipes/576918/


Counting up from 1 will guarantee short, unique URLS. /1, /2, /3 ... etc.

Adding uppercase and lowercase letters to your alphabet will give URLs like those in your question. And you're just counting in base-62 instead of base-10.

Now the only problem is that the URLs come consecutively. To fix that, read my answer to this question here:

http://stackoverflow.com/questions/1051949/map-incrementing-integer-range-to-six-digit-base-26-max-but-unpredictably/1052896#1052896

Basically the approach is to simply swap bits around in the incrementing value to give the appearance of randomness while maintaining determinism and guaranteeing that you don't have any collisions.

FogleBird
+1  A: 

I don't know if you can use this, but we generate content objects in Zope that get unique numeric ids based on current time strings, in millis (eg, 1254298969501)

Maybe you can guess the rest. Using the recipe described here: http://stackoverflow.com/questions/561486/how-to-convert-an-integer-to-the-shortest-url-safe-string-in-python, we encode and decode the real id on the fly, with no need for storage. A 13-digit integer is reduced to 7 alphanumeric chars in base 62, for example.

To complete the implementation, we registered a short (xxx.yy) domain name, that decodes and does a 301 redirect for "not found" URLs,

If I was starting over, I would subtract the "starting-over" time (in millis) from the numeric id prior to encoding, then re-add it when decoding. Or else when generating the objects. Whatever. That would be way shorter..

Ken