ansaurus

Question

What is the most efficient way to encode an arbitrary GUID into readable ASCII (33-127)?

Answer 1

A:

Simply go Base64.

leonbloy 2010-05-13 14:39:38

Answer 2

A:

Assuming that all of your GUIDs are being generated by the same algorithm, you can save 4 bits by not encoding the algorithm nibble, before applying any other encoding :-|

Damien_The_Unbeliever 2010-05-13 14:42:31

Answer 3

+2 A:

Using the full range from 33 (what's wrong wirh space, incidentally?) to 127 gives you 95 possible characters. Expressing the 2^128 possible values of guid in base 95 will use 20 characters. This (modulo things like dropping nybbles that will be constant) is the best you can do. Save yourself the trouble - use base 64.

AakashM 2010-05-13 14:44:39

128 bits in base95 is 19.5 digits to be more accurate. If you consider the length free, you'll get that on average if omit most significant zeros.

Ants Aasma 2010-05-13 14:52:04

Answer 4

A:

An arbitrary GUID? The "naive" algorithm will produce optimal results. The only way to compress a GUID further is to make use of patterns in the data excluded by your "arbitrary" constraint.

jemfinch 2010-05-13 14:45:17

Did you read the question? He's asking how to encode it more efficiently, not how to compress it.

Nick Johnson 2010-05-13 15:51:06

Do you understand that encoding 128 bits more efficiently than 22 6-bit characters *requires* some form of compression and some pattern in the input?

jemfinch 2010-05-13 23:37:03

Answer 5

+5 A:

You have 95 characters available -- so, more than 6 bits, but not quite as many as 7 (about 6.57 actually). You could use 128/log2(95) = about 19.48 characters, to encode into 20 characters. If saving 2 characters in the encoded form is worth the loss of readability to you, something like (pseudocode):

char encoded[21];
long long guid;    // 128 bits number

for(int i=0; i<20; ++i) {
  encoded[i] = chr(guid % 95 + 33);
  guid /= 95;
}
encoded[20] = chr(0);

which is basically the generic "encode a number in some base" code, except that there's no need to reverse the "digits" since the order's arbitrary anyway (and little-endian is more direct and natural). To get back the guid from the encoded string is, in a very similar way, the polynomial computation in base 95 (after subtracting 33 from each digit of course):

guid = 0;

for(int i=0; i<20; ++i) {
  guid *= 95;
  guid += ord(encoded[i]) - 33;
}

essentially using Horner's approach to polynomial evaluation.

Alex Martelli 2010-05-13 14:46:35

Answer 6

+5 A:

Use Base 85. See section 4.1. Why 85? of A Compact Representation of IPv6 Addresses

An IPv6 address, like a GUID is made up of eight 16-bit pieces.

Paul Butcher 2010-05-13 14:55:20

Looks good, but the characters '%' and '?' are not suitable if the encoding is to be used in database queries. We replace them with ':' and '.', because we do not care about IPv6 formatting issues.

mark 2010-05-16 08:10:24

ansaurus

tags:

views:

answers:

What is the most efficient way to encode an arbitrary GUID into readable ASCII (33-127)?

related questions