tags:

views:

107

answers:

1

I want to compress a binary stream. I know that after each '1' there is an higher probability of finding a '0', and after each '0' there is an higher probability of finding a '1'. How should I encode it? I was thinking about Rice codes, but I didn't get so far... Thanks in advance for any reply.

+2  A: 

Have you tried some simple huffman coding? Perhaps it won't save that much, but if one of the codes '10' and '01' has much higher probabilities than '00' or '11', you can remap it to '0' and the others to '10', '110' and '111'.

Of course, this won't be the best choice as it splits your stream into 2 bit chunks and only optimizes one case. However, it can be refined by calculating/measuring probabilities for a bigger input set like 4 or 8 bits, f.e. in the 8 bits case 10101010 and 01010101 will be used more often than 00000000 and 11111111.

You might get even better results with arithmetic coding or some compression that really uses some model based on the bit probalitities.

Another simple approach would be to invert every second bit. As the probability you mention will tend to many alternating stream parts like 0101010, this will give you many stream parts like 111111 which can usually be compressed better by usual compression algorithms. But the success of this method depends on how big the "probability gap" really is.

schnaader
Hi! I've tried Huffmann but, as you notice, it will not give optimal results... However thanks for suggestion arithmetic coding. Seems like the right choice, I will give it a try. Thanks!
zakk
Arithmetic coding is patented, use range coding.
Anton Tykhyy