tags:

views:

440

answers:

4

I've got an algorithm that compares two sorted lists, and I'm planning to make it more LINQ-y.

My first question: is there anything like this in LINQ already?

If not, I'm planning on writing it as an extension method, something like this:

public static class SortedCompareExtension
{
    public static IEnumerable<Pair<T, U>>
        CompareTo<T,U>(this IEnumerable<T> left,
                       IEnumerable<U> right,
                       Func<T, U, int> comparison)
    { /* ... */ }
}

It'll return new Pair<T, U>(t, u) if both elements are equivalent, new Pair<T, U>(t, null) if the element only exists in the left-hand list, etc.

My second question: CompareTo is not a great name for this. What is it? Is it comparing, collating, correlating? What?

A: 

That algorithm that is linked, is very close to a "Full Outer Merge Join". It differs only in how duplicates are treated.

is there anything like this in LINQ already?

It'll return new Pair(t, u) if both elements are equivalent, new Pair(t, null) if the element only exists in the left-hand list, etc.

From left side only, GroupJoin is similiar.

David B
+4  A: 

You are Synchronizing the lists, though the name has unfortunate connotations of thread synchronization.

I would say that 'ReconcileOrdered' (perhaps OrderedReconcile) is a reasonable name (it is probably very important to indicate in the function's name that the arguments must both be ordered for it to behave correctly

ShuggyCoUk
I like 'Reconcile'. I think I'll go with OrderedReconcileWith().
Roger Lipscombe
+1  A: 

How about Diff? I don't believe there's anything already in LINQ to do everything, but there's Except and Intersect. It sounds like your code is mostly a mixture of the two - except those are set operations, whereas you've got sorted lists. It that mostly a case of efficiency, or is there a semantic difference in your case? (One example would be repeated entries.)

Personally I'm not sure I'd use a pair in this case. There's only ever one value involved, isn't there? Isn't it really a case of Value, PresentInLeft (bool) and PresentInRight (bool)? I'm sure each representation has its pros and cons, and you can certainly expose either API from the same underlying data...

Jon Skeet
It's for efficiency. The IEnumerable<T> might actually live on the other side of a network connection (returned from a server), and might be quite large, so I generate them in coroutines.
Roger Lipscombe
Even if a value appears in both lists, it might still need synchronising (e.g. if only comparing by one or two properties, the other properties might need reconciling), so I need both.
Roger Lipscombe
Okay, so it's only a sorted of projected equivalence, rather than equivalence of the whole entity. That's fair enough.
Jon Skeet
A: 

Here's the code I end up with. I quite like it, particularly the selector that means that the caller can decide what to return in each scenario:

public static class OrderedReconcileExtension
{
    public static IEnumerable<TResult>
        ReconcileWith<T, U, TResult>(this IEnumerable<T> left,
                                     IEnumerable<U> right,
                                     Func<T, U, int> comparison,
                                     Func<T, U, TResult> selector)
    {
        return ReconcileHelper(new EnumerableIterator<T>(left),
                               new EnumerableIterator<U>(right),
                               comparison, selector);
    }

    private static IEnumerable<TResult>
        ReconcileHelper<T, U, TResult>(EnumerableIterator<T> left,
                                       EnumerableIterator<U> right,
                                       Func<T, U, int> comparison,
                                       Func<T, U, TResult> selector)
    {
        while (left.IsValid && right.IsValid)
        {
            // While left < right, the items in left aren't in right
            while (left.IsValid && right.IsValid && 
                   comparison(left.Current, right.Current) < 0)
            {
                yield return selector(left.Current, default(U));
                left.MoveNext();
            }

            // While right < left, the items in right aren't in left
            while (left.IsValid && right.IsValid &&
                   comparison(left.Current, right.Current) > 0)
            {
                yield return selector(default(T), right.Current);
                right.MoveNext();
            }

            // While left == right, the items are in both
            while (left.IsValid && right.IsValid &&
                   comparison(left.Current, right.Current) == 0)
            {
                yield return selector(left.Current, right.Current);
                left.MoveNext();
                right.MoveNext();
            }
        }

        // Mop up.
        while (left.IsValid)
        {
            yield return selector(left.Current, default(U));
            left.MoveNext();
        }

        while (right.IsValid)
        {
            yield return selector(default(T), right.Current);
            right.MoveNext();
        }
    }
}

EnumerableIterator is exactly the same as in the code I posted in the other question.

Update: "exactly the same" apart from IsValid is changed to HasCurrent. Pick one.

Roger Lipscombe