views:

69

answers:

3

I need to encode streams of 8 byte such that encoded stream has only digits (zero to nine) in them. Are their any standard encoding mechanism for doing this? If there are multiple ways to do it, which one is efficient in terms of length of encoded string (shorter is better)?

+4  A: 

Treat the 8 bytes as a 64-bit unsigned integer and convert it to decimal and pad it to the left with zeroes. That should make for the shortest possible string, as it utilizes all available digits in all positions except the starting one.

If your data is not uniformly distributed there are other alternatives, looking into Huffman-coding so that the most commonly data patterns can be represented by shorter strings. One way is to use the first digit to encode the length of the string. All numbers except 1 in the first position can be treated as a length specifier. That way the maximum length of 20 digits will never be exceeded. (The 20th digit can only be 0 or 1, the highest 64-bit number is 18,446,744,073,709,551,615.) The exact interpretation mapping of the other digits into lengths should be based on the distribution of your patterns. If you have 10 patterns which are occuring VERY often you could e.g. reserv "0" to mean that one digit represents a complete sequence.

Any such more complicated encoding will however introduce the need for more complex packing/unpacking code and maybe even lookup tables, so it might not be worth the effort.

Anders Abel
...64bit (unsigned) integer...
Rowland Shaw
But it will also be variable length, which requires a delimiter between blocks in the stream, which would be....? (Since all ten digits have been used.) :-)
T.J. Crowder
Thanks for the comments, I've corrected and extended my answer.
Anders Abel
Thanks for the answer! Living in binary world makes you alien to decimal world after some time. :)
Hemant
Your first suggestion will *always* require 20 digits per 8-byte value, which doesn't seem to address the OP's length request (unless most of his values really are above 10,000,000,000,000,000,000).
T.J. Crowder
As I understand variable length was suggested also in this answer. For example"0xx" = x, "1xxxx" = x + 100, ..., "8xx..." (18 digits) = x + 10**16so 8999999999999999999 = 1009999999999999999 ... "9xxx" (20 digits)
ony
@ony: Right. As I said, *"Your **first** suggestion..."* The second half of the second suggestion might have legs, though I'd like to see it expanded/developed further. You don't have five bits to work with, although you do have some *values* to work with that may well pan out.
T.J. Crowder
@TJ You're right about bits vs values. I've updated and tried to clarify.
Anders Abel
+1  A: 

The result that has the shortest length is to convert it to decimal directly. This leads to the highest value being 18446744073709551615, but conversion can be difficult without arbitrary length integer capability.

The next longest is to convert it to octal as one chunk. This results in a maximum length of 22, with a value of 1777777777777777777777. This requires only shifts to convert, and can be handled easily enough.

The next longest is to convert it to either octal or decimal bytewise. This results in a length of 24, with 8 repetitions of 377 or 255 respectively. Converting back and forth is trivial, and is left as an exercise for the reader.

Ignacio Vazquez-Abrams
Thanks for the answer. As for the first option being difficult without arbitrary length integer capability, it is not really an issue. You can split the block in 4 byte integers, convert them to decimal individually and then concatenate them. Since a 4 byte unsigned value takes maximum 10 digits, we still have 20 digits for 8 byte block. What do you think?
Hemant
That's certainly a workable solution, as is breaking it up into 4 2-byte blocks of 5 digits each.
Ignacio Vazquez-Abrams
With the 2 times 4 byte solution you need to watch for the boundaries. Is 111 a 1 in the upper bytes and a 11 in the lower bytes or vice versa or what? So you need to always use exactly 20 digits with this method.
hstoerr
+4  A: 
T.J. Crowder