tags:

views:

449

answers:

7

I'm parsing a binary file format. It encodes an integer using four bytes in a way that will naturally fit within c#'s uint type.

What is the most C#/idiomatic way to implement this function:

uint ReadUint(byte[] buffer);

Assume the buffer contains 4 elements. A complete answer might consider some of the common byte orderings caused by little/big endian assumptions in the file, and document the one(s) it chooses to parse.

+1  A: 

As someone coming from C, this is how I currently implement this function:

static uint ReadLength(byte[] buffer)
{
    uint result = ((uint) buffer[3]) << 24;
    result |= ((uint) buffer[2]) << 16;
    result |= ((uint) buffer[1]) << 8;
    result |= buffer[offset];
    return result;
}

This parses a format that Wikipedia claims is laid out in little-endian fashion, on a .net implementation running on i386/Vista

John McAleely
Note that bitwise | will be simpler than numeric +... see my (updated) answer for an example.
Marc Gravell
why would | be 'simpler' than + ?
John McAleely
My understanding is that the bitwise operations involve less work for the CPU than maths, since it is just applying a bit mask. There may be additional overflow checks etc for + (but not |) in a "checked" context (note that by default C# is "unchecked").
Marc Gravell
I'll add evidence to a (wiki) reply...
Marc Gravell
I would heartily recommend changing the + into |. It's (to me) completely unintuitive, as the primary operation is to do a bitwise "joining together" of two values, it's *not* a numerical addition. Using + raises the question of this actually being an addition, which is (to me) very confusing. If there ever is a bit set where the new bits go, the function will fail due to carry, which (to me) indicates that + is the wrong operation to use.
unwind
I see from the later answer that | is clearly faster in c# than + . In my head this is all just arithmetic of some sort, so + is how I happened to implement it first time around. I've edited this answer to | though, for the benefit of those who read this in the future.
John McAleely
A: 

Assuming that you want to read a stream of them (as your code would suggest) I would say that this is pretty close to the de facto standard way:

MemoryStream ms = new MemoryStream(new byte[100]);
BinaryReader br = new BinaryReader(ms);
uint q = br.ReadUInt32();
John Gietzen
Oddly, I've done a *lot* of work with binary, and I've rarely used this class...
Marc Gravell
Same here. I actually prefer to bit-widdle.
John Gietzen
+2  A: 

I would normally use the BitConverter class for this. In your case the BitConverter.ToUInt32() method.

Rune Grimstad
You probably mean ToUInt32 (see OP)
Marc Gravell
Oops! hehe... Good point
Rune Grimstad
+5  A: 

The most basic (but a little dangerous re endianness) is:

return BitConverter.ToUInt32(buffer, 0);

Other than than, bit-shifting is fine (as per your own reply) - or you can use Jon's EndianBitConverter in MiscUtil, which handles the translations.

(edit)

The little-endian bit-shifting version I use in protobuf-net is pretty-much identical to your version - I just read them in ascending order and use bitwise (not numeric) addition:

return ((uint)buffer[0])
        | (((uint)buffer[1]) << 8)
        | (((uint)buffer[2]) << 16)
        | (((uint)buffer[3]) << 24);
Marc Gravell
Am I right in thinking that running this on a 'big endian' .net platform would break, since it would try to parse the binary format assuming a big endian byte order?
John McAleely
Which "this"? BitConverter.ToUInt32 would return a different value on itanium (IA64) due to endianness, which probably means a problem, yes. The bit-shifting approach is immune to system-endianness.
Marc Gravell
My comment was posted before you put in the bit shifting solution :-)
John McAleely
A: 
byte[] ba = new byte[]{ 0x10, 0xFF, 0x11, 0x01 } ;
var ui = BitConverter.ToUInt32(ba, 0);

Use the BitConverter Class.

JP Alioto
A: 

Simplest way is just

int val  = System.BitConverter.ToInt32(buffer, 0);

This uses the current system endianness, which may or may not be what you want.

tnyfst
+1  A: 

This reply is actually an extended comment (hence wiki) comparing the performance of BitConverter and bitshifting using + vs |; it applies when micro-optimising only!!

Results first:

BitConverter: 972ms, chk=1855032704
Bitwise: 740ms, chk=1855032704
ReadLength: 1316ms, chk=1855032704

Or results if tweaked to allow non-zero base offsets:

BitConverter: 905ms, chk=1855032704
Bitwise: 1058ms, chk=1855032704
ReadLength: 1244ms, chk=1855032704

And the code:

using System;
using System.Diagnostics;
static class Program
{
    static void Main()
    {
        byte[] buffer = BitConverter.GetBytes((uint)123);
        const int LOOP = 50000000;
        uint chk = 0;
        var watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            chk += BitConverter.ToUInt32(buffer, 0);
        }
        watch.Stop();
        Console.WriteLine("BitConverter: " + watch.ElapsedMilliseconds
            + "ms, chk=" + chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            chk += Bitwise(buffer);
        }
        watch.Stop();
        Console.WriteLine("Bitwise: " + watch.ElapsedMilliseconds
            + "ms, chk=" + chk);

        chk = 0;
        watch = Stopwatch.StartNew();
        for (int i = 0; i < LOOP; i++)
        {
            chk += ReadLength(buffer);
        }
        watch.Stop();
        Console.WriteLine("ReadLength: " + watch.ElapsedMilliseconds
            + "ms, chk=" + chk);

        Console.ReadKey();
    }
    static uint Bitwise(byte[] buffer)
    {
        return ((uint)buffer[0])
            | (((uint)buffer[1]) << 8)
            | (((uint)buffer[2]) << 16)
            | (((uint)buffer[3]) << 24);
    }
    static uint ReadLength(byte[] buffer)
    {
        uint result = ((uint)buffer[3]) << 24;
        result += ((uint)buffer[2]) << 16;
        result += ((uint)buffer[1]) << 8;
        result += buffer[0];
        return result;
    }
}
Marc Gravell