ansaurus

Question

Answer 1

+1 A:

You hit the nail on the head.

There are many encoding schemes, such as gamma and delta, which are special cases of elias coding. These are bit-level codes, as opposed to the byte-level code you used, and are useful when you have a strong skew towards small numbers (which can often be achieved by encoding deltas instead of absolute values).

Bit-level encoding schemes are much more difficult to implement than byte-level schemes and the additional CPU burden may outweigh the time saved by having less data to read, though most modern CPUs have "highest-bit" and "lowest-bit" instructions that dramatically improve the performance of bit-level codecs. As CPU speeds continue to outpace RAM speeds, bit-level schemes will become more attractive, though the simplicity of byte-level codecs is a big factor too.

Marcelo Cantos 2010-03-28 00:12:07

Answer 2

A:

Yes, you are right, you save space by encoding using one byte instead of 4. Generally, you will save memory if the values you are encoding are much smaller than the maximum value that would have fit in your original fixed-width encoding.

Keith Randall 2010-03-28 00:13:22

Answer 3

+3 A:

The way you do it is by reserving one of the bits to mean "I'm not done with the value." Usually, that's the most significant bit.

When you read a byte, you process the lower 7 bits. If the most significant bit is 1, then you know there's one more byte to read, and you repeat the process, adding the next 7 bits to the current 7 bits.

The MIDI format uses that exact encoding to represent lengths of MIDI events, in the following manner:

ExpectedValue = 0
byte=ReadFromFile
ExpectedValue = ExpectedValue + (byte AND 0x7f)
if byte > 127 then
1. ExpectedValue = ExpectedValue SHL 7
2. Goto 2
Done

For example, the value 0x80 would be represented using the bytes 0x81 0x00. You can try running the algorithm on those two bytes, and you see you'll get the right value.

UTF-8 works similarly, but it uses a slightly more complex scheme to tell you how many bytes you should be expecting. This allows for some error correction, since you can easily tell if the bytes you're getting match the length claimed. Wikipedia describes their structure quite well.

Michael Madsen 2010-03-28 00:18:46

ansaurus

tags:

views:

answers:

Variable-byte encoding clarification

related questions