ansaurus

Question

N-way intersection of sorted enumerables

Answer 1

+2 A:

You can use LINQ:

    public static IEnumerable<T> Intersect<T>(IEnumerable<IEnumerable<T>> enums) {
        using (var iter = enums.GetEnumerator()) {
            IEnumerable<T> result;
            if (iter.MoveNext()) {
                result = iter.Current;
                while (iter.MoveNext()) {
                    result = result.Intersect(iter.Current);
                }
            } else {
                result = Enumerable.Empty<T>();
            }
            return result;
        }
    }

This would be simple, although it does build the hash-set multiple times; advancing all n at once (to take advantage of sorted) would be hard, but you could also build a single hash-set and remove missing things?

Marc Gravell 2009-12-14 06:30:13

I'm looking for a less simple solution :-) By question is basically: how do I approach a solution that advances all n at once (to take advantage of sorted).

dtb 2009-12-14 06:34:40

Your second version, combined with my second attempt, looks pretty good. I'll grab some coffee and try to understand why it works.

dtb 2009-12-14 07:03:26

D'oh. This is basically `enums.Aggregate(Enumerable.Empty<T>(), Enumerable.Intersect)` (modulo the little optimization if enums is non-empty).

dtb 2009-12-15 03:21:30

You're right. It's `enums.DefaultIfEmpty(Enumerable.Empty<T>()).Aggregate(Enumerable.Intersect);` then :-)

dtb 2009-12-15 06:56:12

Answer 2

+3 A:

OK; more complex answer:

public static IEnumerable<T> Intersect<T>(params IEnumerable<T>[] enums) {
    return Intersect<T>(null, enums);
}
public static IEnumerable<T> Intersect<T>(IComparer<T> comparer, params IEnumerable<T>[] enums) {
    if(enums == null) throw new ArgumentNullException("enums");
    if(enums.Length == 0) return Enumerable.Empty<T>();
    if(enums.Length == 1) return enums[0];
    if(comparer == null) comparer = Comparer<T>.Default;
    return IntersectImpl(comparer, enums);
}
public static IEnumerable<T> IntersectImpl<T>(IComparer<T> comparer, IEnumerable<T>[] enums) {
    IEnumerator<T>[] iters = new IEnumerator<T>[enums.Length];
    try {
        // create iterators and move as far as the first item
        for (int i = 0; i < enums.Length; i++) {
            if(!(iters[i] = enums[i].GetEnumerator()).MoveNext()) {
                yield break; // no data for one of the iterators
            }
        }
        bool first = true;
        T lastValue = default(T);
        do { // get the next item from the first sequence
            T value = iters[0].Current;
            if (!first && comparer.Compare(value, lastValue) == 0) continue; // dup in first source
            bool allTrue = true;
            for (int i = 1; i < iters.Length; i++) {
                var iter = iters[i];
                // if any sequence isn't there yet, progress it; if any sequence
                // ends, we're all done
                while (comparer.Compare(iter.Current, value) < 0) {
                    if (!iter.MoveNext()) goto alldone; // nasty, but
                }
                // if any sequence is now **past** value, then short-circuit
                if (comparer.Compare(iter.Current, value) > 0) {
                    allTrue = false;
                    break;
                }
            }
            // so all sequences have this value
            if (allTrue) yield return value;
            first = false;
            lastValue = value;
        } while (iters[0].MoveNext());
    alldone:
        ;
    } finally { // clean up all iterators
        for (int i = 0; i < iters.Length; i++) {
            if (iters[i] != null) {
                try { iters[i].Dispose(); }
                catch { }
            }
        }
    }
}

Marc Gravell 2009-12-14 07:00:31

Amazing. Thanks! Interestingly, my second attempt is faster than this solution for n=2, but this solution is faster my second attempt chained for any n!=0. Any solution involving Enumerable.Intersect is much slower than both.

dtb 2009-12-14 09:04:03

Any rough estimate on the complexity of this algorithm? I'm tempted to say it's `O(n0+n1+..nn)` but I've got a feeling that's wrong...

dtb 2009-12-14 09:05:19

It never rewinds anything; you could argue that it is O(m * min(n[1],n[2],...n[m])), (m = number of sequences, each of length n[i]); since it only runs until **any** sequence is exhausted, and iterates all the sequences at the same rate until then.

Marc Gravell 2009-12-14 10:23:37

Right, thanks. Looks like the problem can't be solved more efficiently than this, complexity-wise :-)

dtb 2009-12-15 03:25:17

ansaurus

tags:

views:

answers:

N-way intersection of sorted enumerables

related questions