views:

129

answers:

4

Whats the best way of doing variable length encoding of an unsigned integer value in C# ?


"The actual intent is to append a variable length encoded integer (bytes) to a file header."

For ex: "Content-Length" - Http Header

public string somefunction(int data)
{
  //returns the binary format of the data byte as a string
  string data_string = Convert.ToString(data,2);

  //the MSB will cntain 1 bit, so we have to add 7 msb bits
  string MSB = "1000 000";

  //the LSB will contain 7 bits, so we have to add 1 msb bit
  string LSB = "0";

  //the binary string will be split into 1(MSB) and 7 bits(LSB)
  string[] MSB_array = new string[1];
  string[] LSB_array = new string[7];

  //now split the string
  //by adding the MSB bit to the MSB string
  MSB += data_string[0];

  //and by adding the 7 LSB bits to LSB string
  for (int i = 1; i < 8; i++)
  {
      LSB += data_string[i];
      if (LSB.Length == 4)
          LSB += " ";
  }

  //concatenate the two MSB and LSB into one final string
  string final_result = MSB +" "+ LSB;
  return final_result;
}

The above function does return the values as {125,94} for the integer 223 as opposed to the logic below which returns {223,1}.But the size of the array as per VLQ spec cannot be done.

Can this be achieved with some changes in the logic below.


I have written some code which does that ....

A: 

If small values are more common than large ones you can use Golomb coding.

Laurion Burchall
Hi , I appreciate the quick response.I want it in bytes.Can you provide a working implementation in C# if its available.
Kreez
A: 

Perhaps, this post may help you.

VinayC
+2  A: 

A method I have used, which makes smaller values use fewer bytes, is to encode 7 bits of data + 1 bit of overhead pr. byte.

The encoding works only for positive values starting with zero, but can be modified if necessary to handle negative values as well.

The way the encoding works is like this:

  • Grab the lowest 7 bits of your value and store them in a byte, this is what you're going to output
  • Shift the value 7 bits to the right, getting rid of those 7 bits you just grabbed
  • If the value is non-zero (ie. after you shifted away 7 bits from it), set the high bit of the byte you're going to output before you output it
  • Output the byte
  • If the value is non-zero (ie. same check that resulted in setting the high bit), go back and repeat the steps from the start

To decode:

  • Start at bit-position 0
  • Read one byte from the file
  • Store whether the high bit is set, and mask it away
  • OR in the rest of the byte into your final value, at the bit-position you're at
  • If the high bit was set, increase the bit-position by 7, and repeat the steps, skipping the first one (don't reset the bit-position)
          39    32 31    24 23    16 15     8 7      0
value:            |DDDDDDDD|CCCCCCCC|BBBBBBBB|AAAAAAAA|
encoded: |0000DDDD|xDDDDCCC|xCCCCCBB|xBBBBBBA|xAAAAAAA| (note, stored in reverse order)

As you can see, the encoded value might occupy one additional byte that is just half-way used, due to the overhead of the control bits. If you expand this to a 64-bit value, the additional byte will be completely used, so there will still only be one byte of extra overhead.

Note: Since the encoding stores values one byte at a time, always in the same order, big- or little-endian systems will not change the layout of this. The least significant byte is always stored first, etc.

Ranges and their encoded size:

          0 -         127 : 1 byte
        128 -      16.383 : 2 bytes
     16.384 -   2.097.151 : 3 bytes
  2.097.152 - 268.435.455 : 4 bytes
268.435.456 -   max-int32 : 5 bytes

Here's C# implementations for both:

void Main()
{
    using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Create))
    using (BinaryWriter writer = new BinaryWriter(stream))
        writer.EncodeInt32(123456789);

    using (FileStream stream = new FileStream(@"c:\temp\test.dat", FileMode.Open))
    using (BinaryReader reader = new BinaryReader(stream))
        reader.DecodeInt32().Dump();
}

// Define other methods and classes here

public static class Extensions
{
    /// <summary>
    /// Encodes the specified <see cref="Int32"/> value with a variable number of
    /// bytes, and writes the encoded bytes to the specified writer.
    /// </summary>
    /// <param name="writer">
    /// The <see cref="BinaryWriter"/> to write the encoded value to.
    /// </param>
    /// <param name="value">
    /// The <see cref="Int32"/> value to encode and write to the <paramref name="writer"/>.
    /// </param>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="writer"/> is <c>null</c>.</para>
    /// </exception>
    /// <exception cref="ArgumentOutOfRangeException">
    /// <para><paramref name="value"/> is less than 0.</para>
    /// </exception>
    /// <remarks>
    /// See <see cref="DecodeInt32"/> for how to decode the value back from
    /// a <see cref="BinaryReader"/>.
    /// </remarks>
    public static void EncodeInt32(this BinaryWriter writer, int value)
    {
        if (writer == null)
            throw new ArgumentNullException("writer");
        if (value < 0)
            throw new ArgumentOutOfRangeException("value", value, "value must be 0 or greater");

        bool first = true;
        while (first || value > 0)
        {
            first = false;
            byte lower7bits = (byte)(value & 0x7f);
            value >>= 7;
            if (value > 0)
                lower7bits |= 128;
            writer.Write(lower7bits);
        }
    }

    /// <summary>
    /// Decodes a <see cref="Int32"/> value from a variable number of
    /// bytes, originally encoded with <see cref="EncodeInt32"/> from the specified reader.
    /// </summary>
    /// <param name="reader">
    /// The <see cref="BinaryReader"/> to read the encoded value from.
    /// </param>
    /// <returns>
    /// The decoded <see cref="Int32"/> value.
    /// </returns>
    /// <exception cref="ArgumentNullException">
    /// <para><paramref name="reader"/> is <c>null</c>.</para>
    /// </exception>
    public static int DecodeInt32(this BinaryReader reader)
    {
        if (reader == null)
            throw new ArgumentNullException("reader");

        bool more = true;
        int value = 0;
        int shift = 0;
        while (more)
        {
            byte lower7bits = reader.ReadByte();
            more = (lower7bits & 128) != 0;
            value |= (lower7bits & 0x7f) << shift;
            shift += 7;
        }
        return value;
    }
}
Lasse V. Karlsen
Brilliant!! Thanks LVK. Tests work perfectly.I shall make the necessary changes as required.
Kreez
how many algorithms are available to do this sort of encoding , I was just wondering when a integer 223 was encoded using the above logic it returned a 2 byte array with values {223,1}.I have a lib which does this encoding ,returns the byte array as {129,95}.Does this have to do something with 'endianess'.
Kreez
Doubtful, there's nothing magical or brilliant about the above algorithm, so there will probably be many variations that differ in output and approach.
Lasse V. Karlsen
I didn't know that I can't mark two solutions ... Thanks for the code Karlsen.
Kreez
A: 

I worked out a way backwards ..which does what the program) required ...

public static byte[] Int32Encoder(int value, int size)
        {
            byte[] bData = new byte[size];
            int idx = 0;
            bool first = true;

            while (value > 0)
            {
                byte byt = (byte)(value & 127);
                value >>= 7;
                if (first)
                {
                    bData[idx] = byt;
                    first = false;
                }
                else
                {
                    byt = (byte)(byt | 128);
                    bData[idx] = byt;
                }
                idx += 1;
            }

            Array.Reverse(bData);//To change the endianess
            return bData;
        }
Kreez
Next time, consider asking for what you need, instead of something similar. If you had specified that you did not just need *any* variable-length encoding, but a *specific* one, you might've gotten a better and more correct answer from me and others to begin with, instead of wasting time trying to adopt the solution given to you to your (unmentioned) criteria.
Lasse V. Karlsen
I appreciate that.I did not know that there were many variants of the technique that yielded different results,I posted that way and this is the first time I had this req.And also I needed to understand what the ext app needed by trying out different methods.I realized only when the above logic did not work and I had posted the same in one of my comments above.
Kreez