tags:

views:

85

answers:

4

I have:-

IEnumerable<IEnumerable<T>> items;

and I'd like to create:-

IEnumerable<IEnumerable<T>> results;

where the first item in "results" is an IEnumerable of the first item of each of the IEnumerables of "items", the second item in "results" is an IEnumerable of the second item of each of "items", etc.

The IEnumerables aren't necessarily the same lengths. If some of the IEnumerables in items don't have an element at a particular index, then I'd expect the matching IEnumerable in results to have fewer items in it.

For example:-

items = { "1", "2", "3", "4" } , { "a", "b", "c" };
results = { "1", "a" } , { "2", "b" }, { "3", "c" }, { "4" };

Edit: Another example (requested in comments):-

items = { "1", "2", "3", "4" } , { "a", "b", "c" }, { "p", "q", "r", "s", "t" };
results = { "1", "a", "p" } , { "2", "b", "q" }, { "3", "c", "r" }, { "4", "s" }, { "t" };

I don't know in advance how many sequences there are, nor how many elements are in each sequence. I might have 1,000 sequences with 1,000,000 elements in each, and I might only need the first ~10, so I'd like to use the (lazy) enumeration of the source sequences if I can. In particular I don't want to create a new data structure if I can help it.

Is there a built-in method (similar to IEnumerable.Zip) that can do this?

Is there another way?

+3  A: 

Now lightly tested and with working disposal.

public static class Extensions
{
  public static IEnumerable<IEnumerable<T>> JaggedPivot<T>(
    this IEnumerable<IEnumerable<T>> source)
  {
    List<IEnumerator<T>> originalEnumerators = source
      .Select(x => x.GetEnumerator())
      .ToList();

    try
    {
      List<IEnumerator<T>> enumerators = originalEnumerators
        .Where(x => x.MoveNext()).ToList();

      while (enumerators.Any())
      {
        List<T> result = enumerators.Select(x => x.Current).ToList();
        yield return result;
        enumerators = enumerators.Where(x => x.MoveNext()).ToList();
      }
    }
    finally
    {
      originalEnumerators.ForEach(x => x.Dispose());
    }
  } 
}

public class TestExtensions
{
  public void Test1()
  {
    IEnumerable<IEnumerable<int>> myInts = new List<IEnumerable<int>>()
    {
      Enumerable.Range(1, 20).ToList(),
      Enumerable.Range(21, 5).ToList(),
      Enumerable.Range(26, 15).ToList()
    };

    foreach(IEnumerable<int> x in myInts.JaggedPivot().Take(10))
    {
      foreach(int i in x)
      {
        Console.Write("{0} ", i);
      }
      Console.WriteLine();
    }
  }
}
David B
Note that you're not disposing of any of the iterators, which could be bad depending on what they're iterating over.
Jon Skeet
Thanks! Will test.
Iain Galloway
Innnnnnteresting. I didn't know finally would be activated in these iterator methods when there's laziness happening. Makes sense though, as the caller will eventually call Dispose on the lazy IEnumerator.
David B
I've accepted this answer, but make sure you read Jon Skeet's answer as well.
Iain Galloway
+1  A: 

It's reasonably straightforward to do if you can guarantee how the results are going to be used. However, if the results might be used in an arbitrary order, you may need to buffer everything. Consider this:

var results = MethodToBeImplemented(sequences);
var iterator = results.GetEnumerator();
iterator.MoveNext();
var first = iterator.Current;
iterator.MoveNext();
var second = iterator.Current;
foreach (var x in second)
{
    // Do something
}
foreach (var x in first)
{
    // Do something
}

In order to get at the items in "second" you'll have to iterate over all of the subsequences, past the first items. If you then want it to be valid to iterate over the items in first you either need to remember the items or be prepared to re-evaluate the subsequences.

Likewise you'll either need to buffer the subsequences as IEnumerable<T> values or reread the whole lot each time.

Basically it's a whole can of worms which is difficult to do elegantly in a way which will work pleasantly for all situations :( If you have a specific situation in mind with appropriate constraints, we may be able to help more.

Jon Skeet
Thanks! Thankfully I can guarantee that the results are going to be used in order. Does that alleviate the need for the calls to "ToList" in DavidB's answer? (Going to test now)
Iain Galloway
@Iain: I suspect so, although it may be quite complex to get it right. You'll still need to buffer the subsequence references themselves as `IEnumerable<T>`. Hmmm. Instead of returning an `IEnumerable<IEnumerable<T>>`, could you pass in an action to be executed on each value, or something like that? It would make things a lot simpler!
Jon Skeet
@Jon: I don't think I can pass in an Action (good idea though, I'll keep it in mind for similar problems in the future!). At this stage, I'm thinking that if making it elegant will make it super-complicated, I'll just go ahead and inline it all.
Iain Galloway
A: 

Here's one that is a bit shorter, but no doubt less efficient:

Enumerable.Range(0,items.Select(x => x.Count()).Max())
    .Select(x => items.SelectMany(y => y.Skip(x).Take(1)));
diceguyd30
Similarly to AS-CII's answer, this calls GetEnumerator() on the source far too often.
Iain Galloway
A: 

What about this?

        List<string[]> items = new List<string[]>()
        {
            new string[] { "a", "b", "c" },
            new string[] { "1", "2", "3" },
            new string[] { "x", "y" },
            new string[] { "y", "z", "w" }
        };

        var x = from i in Enumerable.Range(0, items.Max(a => a.Length))
                select from z in items
                       where z.Length > i
                       select z[i];
AS-CII
If I had the data in memory already in a form that I could do random-access on it, it would be this easy. Unfortunately I need the lazy enumeration of the source IEnumerables. Calling z[i] - or z.ElementAt(i) - undermines that.
Iain Galloway