ansaurus

Question

Answer 1

+3 A:

You probably want base64, not base62. There's an URL-compatible version of it floating around, so the extra two filler characters shouldn't be a problem.

The process is fairly simple; consider that base64 represents 6 bits and a regular byte represents 8. Assign a value from 000000 to 111111 to each of the 64 characters chosen, and put the 4 values together to match a set of 3 base256 bytes. Repeat for each set of 3 bytes, padding at the end with your choice of padding character (0 is generally useful).

Williham Totland 2009-07-13 14:26:05

The standard Python base64 encoding methods are not really suitable for short URLs, since it is optimized for encoding bytes (ie. strings/letters), and will produce longer outputs than just base-shifting the numerical value.

mikl 2010-04-02 15:34:26

Answer 2

+1 A:

Sorry, I can't help you with a library here. I would prefer using base64 and just adding to extra characters to your choice -- if possible!

Then you can use the base64 module.

If this is really, really not possible:

You can do it yourself this way (this is pseudo-code):

base62vals = []
myBase = 62
while num > 0:
   reminder = num % myBase
   num = num / myBase
   base62vals.insert(0, reminder)

Juergen 2009-07-13 14:26:58

Answer 3

+25 A:

There is no standard module for this, but I have written my own functions to achieve that.

ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def base62_encode(num, alphabet=ALPHABET):
    """Encode a number in Base X

    `num`: The number to encode
    `alphabet`: The alphabet to use for encoding
    """
    if (num == 0):
        return alphabet[0]
    arr = []
    base = len(alphabet)
    while num:
        rem = num % base
        num = num // base
        arr.append(alphabet[rem])
    arr.reverse()
    return ''.join(arr)

def base62_decode(string, alphabet=ALPHABET):
    """Decode a Base X encoded string into the number

    Arguments:
    - `string`: The encoded string
    - `alphabet`: The alphabet to use for encoding
    """
    base = len(alphabet)
    strlen = len(string)
    num = 0

    idx = 0
    for char in string:
        power = (strlen - (idx + 1))
        num += alphabet.index(char) * (base ** power)
        idx += 1

    return num

Notice the fact that you can give it any Alphabet to use for encoding and decoding.

Hope this helps.

PS - For URL shorteners, I have found that it's better to leave out a few confusing characters like 0Ol1oI etc. Thus I use this alphabet for my URL shortening needs - "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

Have fun.

Baishampayan Ghose 2009-07-13 14:27:01

Blixt 2009-07-13 14:32:30

Thanks, just what I was looking for :)

mikl 2009-07-13 14:37:01

base62_encode(-1) :)

wuub 2009-07-13 14:42:44

Oops, I think I'll change it to return 0 if num <= 0 :)

mikl 2009-07-13 14:47:04

Of course, the code is not bullet-proof and all :)

Baishampayan Ghose 2009-07-13 15:26:30

Naming bug: it's not base 62, since the alphabet is customizable.

unwind 2009-09-28 14:24:40

For the decode, it's a better habit not to compute the powers (saves time, is shorter to write, but more importantly avoids off-by-one errors), thus: num=0; for char in string: num = num*base + alphabet.index(char)

ShreevatsaR 2009-09-28 14:31:19

@ShreevatsaR: any particular reason for using str.index() instead of a dictionary lookup? See my answer ...

John Machin 2009-10-05 23:47:26

Answer 4

+2 A:

you can download zbase62 module from pypi

eg

>>> import zbase62
>>> zbase62.b2a("abcd")
'1mZPsa'

ghostdog74 2009-07-13 15:00:35

Yeah, I looked at that earlier, but it converts strings, not numbers :)

mikl 2009-07-13 15:11:11

Answer 5

+2 A:

Personally I like the solution from Baishampayan, mostly because of stripping the confusing characters.

For completeness, and solution with better performance, this post shows a way to use the Python base64 module.

Van Gale 2009-07-14 03:55:44

As mentioned in my comment to Williham Totland, Pythons base64 is suboptimal for encoding numbers, since it is optimized for strings.

mikl 2010-04-02 15:37:52

Answer 6

+2 A:

I have a Python library for doing exactly that here: http://www.djangosnippets.org/snippets/1431/

Simon Willison 2009-09-28 10:59:20

Answer 7

+3 A:

The following decoder-maker works with any reasonable base, has a much tidier loop, and gives an explicit error message when it meets an invalid character.

def base_n_decoder(alphabet):
    """Return a decoder for a base-n encoded string
    Argument:
    - `alphabet`: The alphabet used for encoding
    """
    base = len(alphabet)
    char_value = dict(((c, v) for v, c in enumerate(alphabet)))
    def f(string):
        num = 0
        try:
            for char in string:
                num = num * base + char_value[char]
        except KeyError:
            raise ValueError('Unexpected character %r' % char)
        return num
    return f

if __name__ == "__main__":
    func = base_n_decoder('0123456789abcdef')
    for test in ('0', 'f', '2020', 'ffff', 'abqdef'):
        print test
        print func(test)

John Machin 2009-09-28 14:20:24

Answer 8

+1 A:

I once wrote a script to do this aswell, I think it's quite elegant :)

import string
BASE_LIST = string.digits + string.letters + '_@'
BASE_DICT = dict((c, i) for i, c in enumerate(BASE_LIST))

def base_decode(string, reverse_base=BASE_DICT):
    length = len(reverse_base)
    ret = 0
    for i, c in enumerate(string[::-1]):
        ret += (length ** i) * reverse_base[c]

    return ret

def base_encode(integer, base=BASE_LIST):
    length = len(base)
    ret = ''
    while integer != 0:
        ret = base[integer % length] + ret
        integer /= length

    return ret

Example usage:

for i in range(100):                                    
    print i, base_decode(base_encode(i)), base_encode(i)

WoLpH 2010-03-30 23:58:33

That is neat, thank you. I like the shortness :)

mikl 2010-04-02 15:39:10

ansaurus

tags:

views:

answers:

Base 62 conversion in Python

related questions