ansaurus

Question

Is there a way to produce a binary diff on two byte arrays in c# .net?

Answer 1

+2 A:

Crikey! - that's all a bit complex, what's wrong with a run length encoding of the XOR of the two arrays - does the encoding and decoding in one pass and should be reasonably efficient in space as most of the values will be zero, but you could re-compress the RLE data further if required.

Dipstick 2009-12-12 16:41:48

XOR is wholly unsuitable to such things, unless it's impossible to get a real diff-engine to run with high enough speed. Just scrolling a website one pixel in any direction will produce massive amounts of differences, and those differences, due to XOR, will be worse to compress than just compressing the new image without diffing it.

Lasse V. Karlsen 2009-12-15 11:03:05

Answer 2

+4 A:

If they're guaranteed to be the same size - actually, the same dimensions - then I'm not seeing the importance of all this hashing and binary searches and other overhead. You can simply compare the two byte-by-byte in a loop, and if they don't match, add a "point" to your diff containing both the index and the value in A. To reverse the process, you don't need to look at every byte because you already have indexes.

If the two arrays differ by, say, just 1 byte, then you'll end up with a diff structure that's 5 bytes in size (assuming you use an Int32 for the index), and takes exactly 1 iteration to mutate B back into A. In general the process is O(n) for the diff and O(m) for the revert, where m is the total number of points that actually changed. I'm not an expert on data structures but I doubt you'll be able to come up with something more efficient.

So, something like this:

Diff GetDiff(byte[] a, byte[] b)
{
    Diff diff = new Diff();
    for (int i = 0; i < a.Length; i++)
    {
        if (a[i] != b[i])
        {
            diff.Points.Add(new DiffPoint(i, a[i]));
        }
    }
    return diff;
}

// Mutator method - turns "b" into the original "a"
void ApplyDiff(byte[] b, Diff diff)
{
    foreach (DiffPoint point in diff.Points)
    {
        b[point.Index] = point.Value;
    }
}

// Copy method - recreates "a" leaving "b" intact
byte[] ApplyDiffCopy(byte[] b, Diff diff)
{
    byte[] a = new byte[b.Length];
    int startIndex = 0;
    foreach (DffPoint point in diff.Points)
    {
        for (int i = startIndex; i < point.Index; i++)
        {
            a[i] = b[i];
        }
        a[point.Index] = point.Value;
        startIndex = point.Index + 1;
    }
    for (int j = startIndex; j < b.Length; j++)
    {
        a[j] = b[j];
    }
    return a;
}

struct DiffPoint
{
    public int Index;
    public byte Value;

    public DiffPoint(int index, byte value) : this()
    {
        this.Index = index;
        this.Value = value;
    }
}

class Diff
{
    public Diff()
    {
        Points = new List<DiffPoint>();
    }

    public List<DiffPoint> Points { get; private set; }
}

There's a lot of looping in the ApplyDiffCopy but if you work it out you'll see that it actually only performs one operation per point. Of course, if you don't need a copy and just want to mutate B, then the first ApplyDiff method will be extremely fast if there aren't many actual differences.

And obviously I haven't done much error-checking here. You would want to write your version a bit more defensively (verify array lengths, etc.)

If I've correctly understood the assumptions here and the problem you're trying to solve, then the original ApplyDiff method is going to be the fastest way to restore the original image.

Aaronaught 2009-12-12 16:46:50

+1. I have found that your code provided better solution than mine, and was published earlier. That's why I deleted my answer.

Roman Boiko 2009-12-12 18:08:01

I would suggest using pointer arithmetics with unsafe code for higher speed.And please make your answer shorter (especially code). It looks like people can't see its benefits behind lots of information, or just don't read it to the end - and vote for other answers.Anyway it's your answer, please don't treat my suggestion as telling you what to do. :)

Roman Boiko 2009-12-12 18:49:04

+1. Looks like a complete answer given the information that was provided.

Misha 2009-12-12 18:57:54

This solution is great - I didn't accept it right away as I was looking into the RLE of the XOR of the two arrays. Thank you for taking the time to post such a complete solution.

CuriousCoder 2009-12-12 19:02:11

Unsafe code removes array bounds checking so it will improve performance; it also makes it absolutely critical to perform the appropriate parameter validation at the beginning. As for shortening the code, I'm not really sure how to do that without removing something important, but I'll move the class declarations to the bottom to make the algorithms more obvious.

Aaronaught 2009-12-12 19:02:43

For big differences, this will produce more data than just shipping the new image verbatim. For instance, scrolling a website just one pixel up will produce massive amounts of single-pixel-differences. Are you sure this is what you want? What about posting example screenshots so we can give you some real data?

Lasse V. Karlsen 2009-12-15 11:00:19

I agree, and I could suggest all kinds of optimizations and compression strategies (I've had to implement them in the past), but if this is going to be used in production then it would be better to simply license an off-the-shelf protocol like Blaze or use an open-source one like rdesktop. Why reinvent the wheel?

Aaronaught 2009-12-15 16:53:50

Answer 3

A:

You could use the BitArray class (or use Reflector to see how it's implemented so you don't get copying of your arrays in order to speed it up even more)

        byte[] s1 = new byte[] {0,1,2,3,4,5,6};
        byte[] s2 = new byte[] {0,1,2,4,4,6,6};
        var orig1 = new BitArray(s1);
        var orig2 = new BitArray(s2);
        var diff = orig1.Xor(orig2);
        byte[] diffArray = new byte[s1.Length];
        diff.CopyTo(diffArray, 0); // here we have a byte diff array of s1 and s2

        var reconstruct = orig2.Xor(diff);
        reconstruct.CopyTo(diffArray, 0); // diffArray is now the same as s1

Mikael Svenson 2009-12-12 18:31:59

ansaurus

tags:

views:

answers:

Is there a way to produce a binary diff on two byte arrays in c# .net?

related questions