tags:

views:

392

answers:

5

Quite often one has to encode an big (e.g. 128 or 160 bits) number in an url. For example many web applications use md5(random()) for UUIDs.

If you need to put that value in an URL the common approach is to just encode it as an hexadecimal string.

But obviously hex encoding is not a very tight encoding. What other approaches are there which fit nicely in an URL?

+2  A: 

If you want it tight you can use a base-36 encoding (from 0 to Z).

Otávio Décio
+2  A: 

You can do even better with base64-url encoding (a-z, A-Z, 0-9, - and _ [see RFC4648 Section 5]). RFC4648 covers a number of different encoding methods (base16, base32, and base64) an a couple of variants. Also depending on the sparsity of the bits that are set in the number you could conceivably run it through gzip and then use one of the described encoding methods. Of course use of gzip really depends on how large the number you are going to be encoding is.

Kevin Loney
+4  A: 

I would use The "URL and Filename safe" Base 64 Alphabet.

Base 64 uses two character sets.

Data: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
URLs: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

To use base 64 you need to pad your value to be a multiple of 3 bytes long (24 bits) then split those 24 bits into 4 6bit bytes. Each 6bit value is looked up by position in the string I gave above.

If it all goes well, your final base64 value will always be a multiple of 4 characters long and decode back to a multiple of 3 (8bit) bytes long.

Depending on the language you are using, a lot of them have built in encode and decode functions.

John
A: 

Just use hex. Even if you were to get 8 bits per character you're still using a 16-20 character random sequence, which nobody will want to type or say. If you can't put up a short identifier, work on your search capabilities.

wowest
A: 

Using the hint of base36 I currently use something like this (in Python):

>>> str(base64.b32encode(uuid.uuid1().bytes).rstrip('='))
'MTB2ONDSL3YWJN3CA6XIG7O4HM'
mdorseif