ansaurus

Question

Python UUID represented as special characters

Answer 1

+2 A:

How important is it to you to "squeeze" the representation by 18.75%, i.e., from 32 to 26 characters? Because, if saving this small percentage of bytes isn't absolutely crucial, something like uid.hex.upper().replace('D','Z') will do what you ask (not using the whole alphabet you make available, but the only cost of this is missing that 18.75% "squeezing").

If squeezing down every last byte is crucial, I'd work on substrings of 20 bits each -- that's 5 hex characters, 4 characters in your funky alphabet. There are 6 of those (plus 8 bits left over, for which you can take the hex.upper().replace as above since there's nothing to gain in doing anything fancier). You can easily get the substrings by slicing .hex and turn each into an int with an int(theslice, 16). Then, you can basically apply the same algorithm you're using above -- but the arithmetic is all done on much-smaller numbers, so the speed gain should be material. Also, don't build the string by looping on += -- make a list of all the "digits", and ''.join them all at the end -- that's also a performance improvement.

Alex Martelli 2010-02-17 04:24:54

Agree re. space queezing- good point - though there's an (astronomically remote) possibility of collisions with a .replace('O','D')/etc. The more important point would be to have a reduced, albeit "funky", alphabet that uses fewer visually ambiguous characters (e.g. "D","O","Q", and "0").

Brian M. Hunt 2010-02-18 00:02:02

@Brian, I don't see what "collisions" could occur if you just use `uid.hex.upper().replace('D', 'Z')`. 'D' is the only character in the hex set potentially confusable with another ('0', the digit zero)

Alex Martelli 2010-02-18 02:08:30

@Alex: Oh sorry -- I was thinking the algorithm suggested in the second paragraph would apply `replace('D','Z')` to the 20 bit substrings.

Brian M. Hunt 2010-02-19 16:18:14

Answer 2

+1 A:

>>> OCRf = 'ABCEGHJKLMNPRSTVWXYZ1234567890+='
>>> uuid = 'a8098c1a-f86e-11da-bd1a-00112444be1e'
>>> binstr = bin(int(uuid.replace("-",""),16))[2:].zfill(130)
>>> ocfstr = "".join(OCRf[int(binstr[i:i+5],2)] for i in range(0,130,5))
>>> ocfstr
'HLBJJB2+ETCKSP7JWACGYGMVW+'

To convert back again

>>> "%x"%(int("".join(bin(OCRf.index(i))[2:].zfill(5) for i in ocfstr),2))
'a8098c1af86e11dabd1a00112444be1e'

gnibbler 2010-02-17 04:34:38

There's no need for the fanciness with binstr - you can just fetch the .bytes property on a UUID to get its binary representation.

Nick Johnson 2010-02-17 08:45:05

@Nick Johnson, Can you explain what you mean? I don't see how I can regroup the `.bytes` as base 32

gnibbler 2010-02-17 19:52:47

Just encode it using base32, or any of the other encoding schemes suggested here. My point is that if you have a real UUID object, the third line of your snippet can be replaced with just "uuid.bytes".

Nick Johnson 2010-02-19 08:28:18

Answer 3

+1 A:

transtbl = string.maketrans(
  'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567',
  'ABCEGHJKLMNPRSTVWXYZ1234567890+='
)

uuidstr = uuid.uuid1()

print base64.b32encode(str(uuidstr).replace('-', '').decode('hex')).rstrip('=').translate(transtbl)

Yes, this method does make me a bit ill, thanks for asking.

Ignacio Vazquez-Abrams 2010-02-17 08:04:13

ansaurus

tags:

views:

answers:

Python UUID represented as special characters

related questions