I need to encode streams of 8 byte such that encoded stream has only digits (zero to nine) in them. Are their any standard encoding mechanism for doing this? If there are multiple ways to do it, which one is efficient in terms of length of encoded string (shorter is better)?
Treat the 8 bytes as a 64-bit unsigned integer and convert it to decimal and pad it to the left with zeroes. That should make for the shortest possible string, as it utilizes all available digits in all positions except the starting one.
If your data is not uniformly distributed there are other alternatives, looking into Huffman-coding so that the most commonly data patterns can be represented by shorter strings. One way is to use the first digit to encode the length of the string. All numbers except 1 in the first position can be treated as a length specifier. That way the maximum length of 20 digits will never be exceeded. (The 20th digit can only be 0 or 1, the highest 64-bit number is 18,446,744,073,709,551,615.) The exact interpretation mapping of the other digits into lengths should be based on the distribution of your patterns. If you have 10 patterns which are occuring VERY often you could e.g. reserv "0" to mean that one digit represents a complete sequence.
Any such more complicated encoding will however introduce the need for more complex packing/unpacking code and maybe even lookup tables, so it might not be worth the effort.
The result that has the shortest length is to convert it to decimal directly. This leads to the highest value being 18446744073709551615
, but conversion can be difficult without arbitrary length integer capability.
The next longest is to convert it to octal as one chunk. This results in a maximum length of 22, with a value of 1777777777777777777777
. This requires only shifts to convert, and can be handled easily enough.
The next longest is to convert it to either octal or decimal bytewise. This results in a length of 24, with 8 repetitions of 377
or 255
respectively. Converting back and forth is trivial, and is left as an exercise for the reader.