ansaurus

Question

Most efficient way to encode 2 positions between 0 and 64?

Answer 1

+3 A:

As you have stated the problem, no you cannot do better that the solution you have proposed.

However, if the distribution of the zeros in the numbers is skewed, you may be able to get better compression on average by using Huffman codes or a similar technique to represent the counts. Another possibility is to use delta coding if the zero distribution is strongly correlated from one 64bit value to the next.

In either case, you will need to use a variable number of bits to represent the numbers of zeros. And if your assumptions about skewedness or correlation turn out to be false, you may end up using more bits on average than if you had done it the simple way.

Stephen C 2009-09-14 11:12:53

In general, the distribution of zeroes is not skewed, but the original values appear in large groups where the leading zeroes are the same and the trailing zeroes are very similar, so delta encoding would work well. On a quick glance at Hamming codes, I couldn't see how this shortens the codeword, it's an error checking code and actually adds (validation) information, doesn't it?

Hanno Fietz 2009-09-14 11:30:58

Did you confuse Hamming and Huffman?

starblue 2009-09-14 12:24:20

@starblue: Yes I did. Corrected now. Thanks

Stephen C 2009-09-14 13:47:46

Answer 2

+1 A:

Your solution seems pretty good.
Huffman coding is another way to compress your values especially if there are values with great frequency.

It's not very difficult to implement it, but it might be overwhelming if you don't have much data to transmit.

Nick D 2009-09-14 11:13:09

Answer 3

+4 A:

Your analysis sounds right for single vlaues. But if you're transmitting lots of such values together, a generic entropy encoding algorithm like gzip will probably do better, since it can eliminate the strings of zeroes quite well and also exploit redundancies in the data.

Michael Borgwardt 2009-09-14 11:13:52

+1 for gzip, though it's not immediately useful for me now, it's a good thing to keep in mind, I think.

Hanno Fietz 2009-09-14 11:33:22

Answer 4

+2 A:

l can be from 0 to 64, so don't send l, send n and m, since they can both be zero, and don't need to go up to 64 (they simply need to be able to add to 64).

The l bits must start and end with a 1, so they do not need to be transmitted.

send 6 bits for n
send up to 6 bits for m (see below)
calculate l = 64 - (n + m)
if l = 0, the number is 0, don't send anything else
if l = 1, the number is 1 * 2^m, don't send anything else
if l = 2, the number is 3 * 2^m, don't send anything else
send the middle l - 2 bits.

Maximum overhead = 10 bits.

The reduction in the bits for m is because
if n > 32 then you know m < 32, so only needs 5 bits
if n > 48 then you know m < 16, so only needs 4 bits
if n > 56 then you know m < 8, so only needs 3 bits
if n > 60 then you know m < 4, so only needs 2 bits
if n = 63 then you know m < 2, so only needs 1 bit

Stephen Denne 2009-09-14 11:39:59

10 bits is overhead... meaning at most l + 10 bits will be needed.

Stephen Denne 2009-09-14 12:09:26

Right, I can leave 2 bits off the payload, too, nice! And good point about deducing the bitlength of m from the value of n.

Hanno Fietz 2009-09-14 14:24:36

If you decide to always encode zero as n=63,m=1 then all the ">" in the above can be ">=" thereby adding a few more instances where one less bit needs to be sent.

Stephen Denne 2009-09-16 21:24:41

Answer 5

A:

There are 64 possible start positions n of the sequence of ones and the length of the sequence l can be no longer then 64 - n. So there a

r = sum(n = 0..63, 64 - n) + 1

sequences in total. The added one is for a sequence of all zeros. Doing some math yields the following.

r = 64 * 64 - (63 * 64) / 2 + 1
  = 2081

Representing 2081 possible values requires log2(2081) = 11.023 bits. Your suggestion to encode the information using two 6 bit numbers requiring 12 bits in total is hence optimal (under the assumption of equal distributions of all possible values).

Daniel Brückner 2009-09-14 11:48:27

I understood the question as meaning the data portion in the middle could contain 1s and 0s, not as meaning the data portion was contiguous 1s.

Stephen Denne 2009-09-14 12:11:35

You are right - I got the middle part completly wrong. Thanks for pointing that out.

Daniel Brückner 2009-09-14 12:41:17

This includes the sequence of length l = 0 starting at positions n = 0..63. You only need to encode one of those, so there are only 2018 values for which 11 bits are enough.

Henrik 2009-09-15 17:39:33

(Besides that my answer missed the question) the sequence of all zeros is included only ones. For example take n = 0. There are 64 sequences with 1 to 64 bits set, hence (64 - n) does not include a sequence of all zeros, but (65 - n) would. This goes up to n = 63 where there is only 1 sequence with 1 bit set. The sequence of all zeros is explicitly added by the term "+ 1" after the sum.

Daniel Brückner 2009-09-15 18:15:31

Yes, you are right.

Henrik 2009-09-15 19:25:37

ansaurus

tags:

views:

answers:

Most efficient way to encode 2 positions between 0 and 64?

related questions