ansaurus

Question

Encode integer as variable-length big-endian byte-array

Answer 1

+3 A:

private static int Encode(int value, byte[] buffer, int index)
{
    int length = 0;
    int valueCopy = value;
    while (valueCopy != 0)
    {
        valueCopy >>= 8;
        length++;
    }
    for (int i = 0; i < length; i++)
    {
        buffer[index + length - i - 1] = (byte)value;
        value >>= 8;
    }
    return length;
}

Pavel Radzivilovsky 2010-07-04 20:35:57

The double-loop is somewhat ugly, but not nearly as ugly as what I've come up with so far. +1.

dtb 2010-07-04 20:53:07

@dtb: If your input size is fixed, i.e. 32-bit or 64-bit, you can entirely do away with the loops and hard-code the conversion - it will even be a little faster.

casablanca 2010-07-04 20:55:11

@casablanca: Well, my input size is always 32-bit (a non-negative `int`), but I want to omit leading zeros when writing the value to the byte[]. Can you post an answer that shows how a solution without loops looks like? (Without omitting leading zeros it's simple; I'm struggling with the variable-length output.)

dtb 2010-07-04 21:01:24

there's inherent need for the loop to find how long is the target number.

Pavel Radzivilovsky 2010-07-04 21:07:12

@digEmAll: You can only unroll the loop if you know the number of iterations, which isn't obvious in this case.@Pavel: You can break down the loop into `if` statements that check if each byte of the number (from left to right) is non-zero, and if so, append it to the buffer.

casablanca 2010-07-04 21:13:07

@casablanca: Yes, I thought to it some minutes after writing, comment removed ;)

digEmAll 2010-07-04 21:18:20

@Pavel Radzivilovsky. I think this line `buffer[length - i - 1] = (byte)value;` should be `buffer[index + length - i - 1] = (byte)value;`. Despite the double loop, your method looks like the more readable, so +1.

Juan Pablo Califano 2010-07-04 21:37:34

@Juan, thanks, added.

Pavel Radzivilovsky 2010-07-05 07:25:48

Answer 2

A:

Note that you are saving the value to a variable Length byte array. If you save these byteArray, you need save also the length.

You can see the protected Functions Write7BitEncodedInt from BinaryWriter and Read7BitEncodedInt from BinaryReader.

These functions save storage on disk for positive numbers. A number 0-128 needs only one byte.

Microsoft uses these functions to store/retrieve the String length prefix when save to Stream.

To use these Functions, you can create own Class derived from BinaryReader / BinaryWriter.

x77 2010-07-04 21:00:17

Yes, I'm also saving the length so I can properly decode the byte[] back to an integer. Unfortunately I can't use Write7BitEncodedInt because I need to write the integer value exactly in the format described in my question.

dtb 2010-07-04 21:04:48

Answer 3

+2 A:

As per OP's request, here is a version without loops for a 32-bit number:

private static int Encode(int value, byte[] buffer, int index)
{
    byte temp;
    bool leading = true;

    temp = (value >> 24) & 0xFF;
    if (temp > 0) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = (value >> 16) & 0xFF;
    if (temp > 0 || leading == false) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = (value >> 8) & 0xFF;
    if (temp > 0 || leading == false) {
      buffer[index++] = temp;
      leading = false;
    }

    temp = value & 0xFF;
    buffer[index++] = temp;

    return index;
}

Version using a loop for 32-bit numbers:

private static int Encode(int value, byte[] buffer, int index)
{
    int length = 0;

    for (int i = 3; i >= 0; i++) {
      byte temp = (byte)(value >> (8 * i));
      if (temp > 0 || length > 0) {
        buffer[index++] = temp;
        length++;
      }
    }

    return length;
}

Note that this version doesn't write anything if the input is just 0.

casablanca 2010-07-04 21:17:12

-1. This is wrong. It doesn't exclude leading zeros. It excludes every byte whose value is zero, instead. Try it with 0x00120056 and you'll see the result is not correct.

Juan Pablo Califano 2010-07-04 21:23:05

Sorry, I overlooked that part. It's fixed now.

casablanca 2010-07-04 21:25:14

No problem. Removed downvote...

Juan Pablo Califano 2010-07-04 21:26:56

@casablanca: I like this. Please do me a small favour and turn this back into a loop (the shift is just 8*i). Can you also change it so it returns the number of bytes written?

dtb 2010-07-04 21:29:58

@dtb, putting this in a loop will mean that you have a conditional branch in the loop. Once you have the first non-zero byte you can blindly pack the rest of the bytes without the conditional branch.

Chris Taylor 2010-07-04 21:36:54

@Chris Taylor: I'm looking for a solution that is simple, readable, maintainable, not necessarily one that takes a few nanoseconds less than other solution. If you see two loops, you need to figure out what each loop's doing, while if there's only one loop, you know immediately that this loop simply writes some bytes to the byte[] in some way.

dtb 2010-07-04 21:44:40

Pavel Radzivilovsky 2010-07-05 07:27:56

ansaurus

tags:

views:

answers:

Encode integer as variable-length big-endian byte-array

related questions