Given n enumerables of the same type that return distinct elements in ascending order, for example:
IEnumerable<char> s1 = "adhjlstxyz";
IEnumerable<char> s2 = "bdeijmnpsz";
IEnumerable<char> s3 = "dejlnopsvw";
I want to efficiently find all values that are elements of all enumerables:
IEnumerable<char> sx = Intersect(new[] { s1, s2, s3 });
Debug.Assert(sx.SequenceEqual("djs"));
"Efficiently" here means that
- the input enumerables should each be enumerated only once,
- the elements of the input enumerables should be retrieved only when needed, and
- the algorithm should not recursively enumerate its own output.
I need some hints how to approach a solution.
Here is my (naive) attempt so far:
static IEnumerable<T> Intersect<T>(IEnumerable<T>[] enums)
{
return enums[0].Intersect(
enums.Length == 2 ? enums[1] : Intersect(enums.Skip(1).ToArray()));
}
Enumerable.Intersect collects the first enumerable into a HashSet, then enumerates the second enumerable and yields all matching elements.
Intersect
then recursively intersects the result with the next enumerable.
This obviously isn't very efficient (it doesn't meet the constraints). And it doesn't exploit the fact that the elements are sorted at all.
Here is my attempt to intersect two enumerables. Maybe it can be generalized for n enumerables?
static IEnumerable<T> Intersect<T>(IEnumerable<T> first, IEnumerable<T> second)
{
using (var left = first.GetEnumerator())
using (var right = second.GetEnumerator())
{
var leftHasNext = left.MoveNext();
var rightHasNext = right.MoveNext();
var comparer = Comparer<T>.Default;
while (leftHasNext && rightHasNext)
{
switch (Math.Sign(comparer.Compare(left.Current, right.Current)))
{
case -1:
leftHasNext = left.MoveNext();
break;
case 0:
yield return left.Current;
leftHasNext = left.MoveNext();
rightHasNext = right.MoveNext();
break;
case 1:
rightHasNext = right.MoveNext();
break;
}
}
}
}