tags:

views:

795

answers:

3

Let's say I have these two arrays:

var array1 = new[] {"A", "B", "C"};
var array2 = new[] {"A", "C", "D"};

I would like to get the differences between the two. I know I could write this in just a few lines of code, but I want to make sure I'm not missing a built in language feature or a LINQ extension method.

Ideally, I would end up with the following three results:

  • Items not in array1, but are in array2 ("D")
  • Items not in array2, but are in array1 ("B")
  • Items that are in both

Thanks in advance!

+22  A: 

If you've got LINQ available to you, you can use Except and Distinct. The sets you asked for in the question are respectively:

- array2.Except(array1)
- array1.Except(array2)
- array1.Intersect(array2)
Jon Skeet
Do you know what kind of performance guarantees are one this? Presumably Except would have to make a sorted copy of each array first. I can't find any of this on MSDN.
Eclipse
No, it doesn't make a sorted copy. It creates a set from the excluded sequence, and then iterates over the source sequence, yielding any element which isn't in the excluded sequence.
Jon Skeet
(When I say "set" I mean "hash set".)
Jon Skeet
Thanks, I was hoping this would be something I could use in my application, but since my data is presorted and very large, the speed loss is too high.
Eclipse
+5  A: 

from the MSDN 101 LINQ samples....

public void Linq52() {
    int[] numbersA = { 0, 2, 4, 5, 6, 8, 9 };
    int[] numbersB = { 1, 3, 5, 7, 8 };

    IEnumerable<int> aOnlyNumbers = numbersA.Except(numbersB);

    Console.WriteLine("Numbers in first array but not second array:");
    foreach (var n in aOnlyNumbers) {
        Console.WriteLine(n);
    }
}
Muad'Dib
+1  A: 

I've had to do things similar to this with very large sets of data. If you're dealing with a few thousand or so, use the Linq stuff since it's much clearer. But if you know that your arrays are pre-sorted, running a merge like this can do it significantly faster, since it only makes one pass through the data and doesn't need to allocate as much memory as the Linq version.

int iA = 0;
int iB = 0;
List<int> inA = new List<int>();
List<int> inB = new List<int>();
List<int> inBoth = new List<int>();
while (iA < numbersA.Length && iB < numbersB.Length)
{
    if (numbersA[iA++] < numbersB[iB++])
    {
     inA.Add(numbersA[iA++]);
    }
    else if (numbersA[iA++] == numbersB[iB++])
    {
     inBoth.Add(numbersA[iA++]);
     ++iB;
    }
    else
    {
     inB.Add(numbersB[iB++]);
    }
}
while (iA < numbersA.Length)
{
    inA.Add(numbersA[iA++]);
}
while (iB < numbersB.Length)
{
    inB.Add(numbersB[iB++]);
}

Again, this is really only needed if you are dealing with hundreds of thousands of values.

Eclipse