views:

594

answers:

5

Problem:
I have two arrays that can possibly be different lengths. I need to iterate through both arrays and find similarities, additions, and deletions.

What's the fastest and most efficient way to accomplish this in C#?

Edit: The arrays are pre-sorted and they can contain anywhere between 50-100 items. Also, there aren't any constraints on speed and/or memory usage (however, no one likes a memory hog;)


For example:

String[] Foo_Old = {"test1", "test2", "test3"};
String[] Foo_New = {"test1", "test2", "test4", "test5"};

AND

String[] Bar_Old = {"test1", "test2", "test4"};
String[] Bar_New = {"test1", "test3"};


Differences:

(with respect to the Foo_New array)

[Same]    "test1"
[Same]    "test2"
[Removed] "test3"
[Added]   "test4"
[Added]   "test5"

(with respect to the Bar_New array)

[Same]    "test1"
[Removed] "test2"
[Removed] "test4"
[Added]   "test3"


Thanks for your help!

A: 

Since your arrays are sorted, you should be able to just go through the arrays simultaneously, and in one pass and determine if each element is in the other array. (Similar to the merge step in merge sort.) You can see a sample of that below:

string[] oldVersion = { "test1", "test2", "test3" };
string[] newVersion = { "test1", "test2", "test4", "test5" };

int oldIndex = 0, newIndex = 0;

while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) {
   int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]);

   if (comparison < 0)
      Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);
   else if (comparison > 0)
      Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);
   else {
      Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]);
      newIndex++;
   }
}

while (oldIndex < oldVersion.Length)
   Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);

while (newIndex < newVersion.Length)
   Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);

Alternatively you'd need to go through one array, and for each element in this array, do a single pass of the other array looking for a match.

Edit: JP has a good suggestion on how to do this using the framework. Although, assuming the arrays are sorted, the benefit of my approach is that you only have to do one pass to find all the results. You would not have to do three passes.

Rob Rolnick
This could come in handy if I have to port my code to another language that doesn't rely on .NET. Thanks editing and giving the example!
Sean
A: 

What you're describing sounds very much like a simple diff algorithm. Diff algorithms are most prominently used in source control projects. Search around google and you can probably find some good stuff.

Spencer Ruport
Exactly! I want to compare directory tree differences.
Sean
+4  A: 

You can use Except and Intersect ...

var Foo_Old = new[] { "test1", "test2", "test3" }; 
var Foo_New = new[] { "test1", "test2", "test4", "test5" };

var diff = Foo_New.Except( Foo_Old );
var inter = Foo_New.Intersect( Foo_Old );
var rem = Foo_Old.Except(Foo_New);

foreach (var s in diff)
{
    Console.WriteLine("Added " + s);
}

foreach (var s in inter)
{
    Console.WriteLine("Same " + s);
}

foreach (var s in rem)
{
    Console.WriteLine("Removed " + s);
}
JP Alioto
Thanks, JP. This is exactly what I needed.
Sean
note this is slightly less efficient and less encapsulated than my implementation ... not that it matters that much, it solves the problem
Sam Saffron
@Sam, I actually like both of your answers, but I can't select two answers. :( I'll probably combine them.
Sean
+1  A: 

I wrote this a while back:

Usage:

foreach (var diff in Foo_Old.Diff(Foo_New)){
   Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value);
}

Implementation:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqExtensions {

    enum DiffAction {
       Added,
       Removed,
       Same
    }

    class DiffPair<T> {
        public T Value { get; set; }
        public DiffAction DiffAction { get; set; }
    }

    static class DiffExtension {
        public static IEnumerable<DiffPair<T>> Diff<T>
                 (
                     this IEnumerable<T> original,
                     IEnumerable<T> target 
                 ) {

            Dictionary<T, DiffAction> results = new Dictionary<T, DiffAction>();

            foreach (var item in original) {
                results[item] = DiffAction.Removed;
            }

            foreach (var item in target) {
                if (results.ContainsKey(item)) {
                    results[item] = DiffAction.Same;
                } else {
                    results[item] = DiffAction.Added;
                }
            }
            return results.Select(
                pair => new DiffPair<T> {
                    Value=pair.Key, 
                    DiffAction = pair.Value
                });
        }
    }

}
Sam Saffron
+1  A: 

I went ahead and hand-coded one and use the example in the accepted answer, and the hand-coded one performs a little better. I handled outputting my strings a little differently. Other factors to consider include whether the Except make a sorted copy of the array (since it cannot assume it's sorted) or whether it makes some kind of hash or a linear search (it's actually restricted to IEnumerable - for very large arrays which are already sorted, this could be a problem). You could change mine to compare IEnumerable (which is more general) instead of IComparable[].

static void ArrayCompare(IComparable[] Old, IComparable[] New)
{
    int lpOld = 0;
    int lpNew = 0;
    int OldLength = Old.Length;
    int NewLength = New.Length;
    while (lpOld < OldLength || lpNew < NewLength)
    {
        int compare;

        if (lpOld >= OldLength) compare = 1;
        else if (lpNew >= NewLength) compare = -1;
        else compare = Old[lpOld].CompareTo(New[lpNew]);

        if (compare < 0)
        {
            Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString()));
            lpOld++;
        }
        else if (compare > 0)
        {
            Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString()));
            lpNew++;
        }
        else
        {
            Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString()));
            lpOld++;
            lpNew++;
        }
    }
}

static void ArrayCompare2(IComparable[] Old, IComparable[] New) {
    var diff = New.Except( Old );
    var inter = New.Intersect( Old );
    var rem = Old.Except(New);

    foreach (var s in diff)
    {
        Debug.WriteLine("Added " + s);
    }

    foreach (var s in inter)
    {
        Debug.WriteLine("Same " + s);
    }

    foreach (var s in rem)
    {
        Debug.WriteLine("Removed " + s);
    }
}

static void Main(string[] args)
{
    String[] Foo_Old = {"test1", "test2", "test3"};
    String[] Foo_New = {"test1", "test2", "test4", "test5"};
    String[] Bar_Old = {"test1", "test2", "test4"};
    String[] Bar_New = {"test1", "test3"};

    Stopwatch w1 = new Stopwatch();
    w1.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare(Foo_Old, Foo_New);
        ArrayCompare(Bar_Old, Bar_New);
    }
    w1.Stop();

    Stopwatch w2 = new Stopwatch();
    w2.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare2(Foo_Old, Foo_New);
        ArrayCompare2(Bar_Old, Bar_New);
    }
    w2.Stop();

    Debug.WriteLine(w1.Elapsed.ToString());
    Debug.WriteLine(w2.Elapsed.ToString());
}
Cade Roux
Thanks, for taking the time to hand-code a solution and test its speed!
Sean