views:

98

answers:

4

I'm working on making a URL shortener for my site, and my current plan (I'm open to suggestions) is to use a node ID to generate the shortened URL. So, in theory, node 26 might be short.com/z, node 1 might be short.com/a, node 52 might be short.com/Z, and node 104 might be short.com/ZZ. Something like that. And when a user goes to that URL, I need to reverse the process (obviously).

I can think of some kludgy ways to go about this, but I'm guessing there are better ones. Any suggestions?

A: 

Use hex(id)[2:] and int(urlpart, 16). There are other options. base32 encoding your id could work as well, but I don't know that there's any library that does base32 encoding built into Python.

Apparently a base32 encoder was introduced in Python 2.4 with the base64 module. You might try using b32encode and b32decode. You should give True for both the casefold and map01 options to b32decode in case people write down your shortened URLs.

Actually, I take that back. I still think base32 encoding is a good idea, but that module is not useful for the case of URL shortening. You could look at the implementation in the module and make your own for this specific case. :-)

Omnifarious
A: 

To ascii int.

ord('a')

gives 97

And back to a string:

str(unichr(97))

gives 'a'

Dominic Bou-Samra
A: 

I would simply use code like the following to set up a couple of translation tables, one from node to ASCII, the other to go back the other way.

The sample code below provides 36 nodes for one-character, 1,332 for up to two characters, 47,998 for up to three characters and a whopping 1,727,604 for up to four characters but you should start being wary about the sizes of the tables at that point (run-time conversion, rather than pre-calculating lookup tables, may be a better option if you get to that point).

Keep in mind that this was with digits and lowercase letters only. If you decide to use uppercase, the quantities are:

length = 1     node count =         62
         2                       3,906
         3                     242,234
         4                  15,018,570

Sample code follows below:

nd_to_asc = []
asc_to_nd = {}
full_range = range(48,58) + range(97,123)

# One, two, three and four-length codes.

for p1 in full_range:
    nd_to_asc.append (chr(p1))
    for p2 in full_range:
        nd_to_asc.append ("%s%s"%(chr(p1),chr(p2)))
        for p3 in full_range:
            nd_to_asc.append ("%s%s%s"%(chr(p1),chr(p2),chr(p3)))
            for p4 in full_range:
                nd_to_asc.append ("%s%s%s%s"%(chr(p1),chr(p2),chr(p3),chr(p4)))

# Reverse lookup.
for i in range(len(nd_to_asc)):
  asc_to_nd[nd_to_asc[i]] = i

print len(nd_to_asc)
paxdiablo
+1  A: 

What about BASE58 encoding the URL? Like for example flickr does.

# note the missing lowercase L and the zero etc.
BASE58 = '123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ' 
url = ''
while node_id >= 58:
    div, mod = divmod(node_id, 58)
    url = BASE58[mod] + url
    node_id = int(div)

return 'http://short.com/%s' % BASE58[node_id] + url

Turning that back into a number isn't a big deal either.

Ivo Wetzel
Good point about the l and 0.
Duncan
This is great. I ended up finding another (more complete) answer here on SO though: http://stackoverflow.com/questions/1119722/base-62-conversion-in-python
mlissner