tags:

views:

171

answers:

5

Say I have an IEnumerable. For example, {2,1,42,0,9,6,5,3,8}.

I need to get "runs" of items that match a predicate. For example, if my predicate was

bool isSmallerThanSix(int number){...}

I would want to get the following output: {{2,1},{0},{5,3}}

Is there a built-in function that accomplishes this?

So far I have this:

public static IEnumerable<IEnumerable<T>> GetSequences<T>(this IEnumerable<T> source,
      Func<T, bool> selector) {

        if (source == null || selector == null) {
   yield break;
        }

  IEnumerable<T> rest = source.SkipWhile(obj => !selector(obj));

  while (rest.Count() > 0) {
   yield return rest.TakeWhile(obj => selector(obj));
   rest = rest
     .SkipWhile(obj => selector(obj))
     .SkipWhile(obj => !selector(obj));
  }


 }

which seems to work, but was written by me in the middle of the night and thus inefficient fifteen ways from Tuesday. Is there a better, preferably built-in (and therefore well-tested) way?

Thank y'all so much for your time,

Ria.

+6  A: 

There is not a build in method as far as I'm aware. However, calling the Count extension method on an IEnumerable isn't very efficient as it has to enumerate the list to get the count. Therefore, I've come up with this that has the same effect.

public static IEnumerable<IEnumerable<T>> 
    GetSequences<T>(this IEnumerable<T> source, Func<T, bool> selector)
{
    // omitted null checks for brevity
    var list = new List<T>();

    foreach(var item in source)
    {
        if (selector.Invoke(item))
        {
            list.Add(item);
        }
        else if (list.Count > 0)
        {
            yield return list;
            list = new List<T>();
        }
    }

    if (list.Count > 0)
        yield return list;
}

As Jon Skeet mentioned, the use of SkipWhile and TakeWhile also seem pretty inefficient in this case as they will create iterator upon iterator upon iterator. You can check this out as when debugging your example it goes a bit crazy as you step through it trying to find the next sequence and so on even though the example is simple.

Garry Shutler
IMO you shouldn't reuse the existing list - if the caller buffers the results (e.g. by calling ToList()) they'll end up with the same list multiple times.
Jon Skeet
Good point. Altered accordingly.
Garry Shutler
The fact that we've come up with pretty much identical solutions is certainly encouraging for the chances of it working :)
Jon Skeet
Yes, I was thinking just that!
Garry Shutler
So no library method, eh? That's not nice of the .Net folks, but thank you very much.:)
Ria
+1  A: 

I suspect your code won't actually work in all cases. In particular, SkipWhile and TakeWhile are lazily evaluated - if the calling code doesn't actually read through all of the yielded IEnumerable<T>s (or worse, buffers them up and reads them in a different order!) I strongly suspect you'll get the wrong results.

I suspect you really need to do something like:

public static IEnumerable<IEnumerable<T>> GetSequences<T>(
    this IEnumerable<T> source,
    Func<T, bool> selector)
{
    List<T> current = new List<T>();
    foreach (T element in source)
    {
        if (selector(element))
        {
            current.Add(element);
        }
        else if (current.Count > 0)
        {
            yield return current;
            current = new List<T>();
        }           
    }
    if (current.Count > 0)
    {
        yield return current;
    }
}

(This ignores error checking - due to the deferred execution of iterator blocks, you'd want to do that in a separate method which then calls this method as a private one - it's a very common pattern when writing production-quality iterator blocks.)

The choice of List<T> is somewhat arbitrary, btw - you could certainly use a LinkedList<T> instead, for example. Or if you do go for List, you could return IEnumerable<IList<T>> instead of IEnumerable<IEnumerable<T>> which may make it easier for callers to process the results.

Jon Skeet
Jon, what do you mean by "due to the deferred execution of iterator blocks, ... do that in a separate method which then calls this method as a private one"? I do this a lot for recursive routines, but this is not recursive, so I don't follow the strategy. Would you mind elaborating? Much obliged...
Mike Rosenblum
Ah, never mind, I found your outstanding article on the topic here: http://msmvps.com/blogs/jon_skeet/archive/2009/01/23/designing-linq-operators.aspx. Thanks for answering my question so fast! :-) :-p
Mike Rosenblum
I'm glad you enjoyed that blog post, but a more pertinent one is the linked one: http://msmvps.com/blogs/jon_skeet/archive/2008/03/02/c-4-idea-iterator-blocks-and-parameter-checking.aspx
Jon Skeet
Yes, I did find that one as well. Thanks so much, Jon. :-)
Mike Rosenblum
A: 

Not sure I know what I am talking about but that won't stop me from speaking...

Does it make sense to define a new type of iterator that can skip to the next sub-sequence when the current is complete?

Using Java and Integer example syntax, the client code looks like the following:

Iterator2<Integer> iter = getSequencences(someCollection, someCriteria)
while (iter.hasNextSequence())
{
  iter.nextSequence();
  while (iter.hasNext())
  {
    Integer i = iter.next();
  }
}

*ducks*

Hemal Pandya
A: 

By working directly with the enumerator it is possible to produce fully deferred results.

I'd probably use the version presented by Garry and Jon as it is simpler and gets the job done, but this was fun to write.

public static class Program
{
    static IEnumerable<IEnumerable<T>>
        GetSequences<T>(this IEnumerable<T> source, Func<T, bool> selector)
    {
        var enumerator = source.GetEnumerator();
        while (enumerator.MoveNext())
            if (selector(enumerator.Current))
                yield return TakeWhile(enumerator, selector);
        yield break;
    }

    static IEnumerable<T> 
        TakeWhile<T>(IEnumerator<T> enumerator, Func<T, bool> selector)
    {
        do
        {
            yield return enumerator.Current;
        } while (enumerator.MoveNext() && selector(enumerator.Current));
        yield break;
    }

    static void Main()
    {
        var nums = new[] { 2, 1, 42, 0, 9, 6, 5, 3, 8 };
        var seqs = nums.GetSequences(i => i < 6);

        Console.WriteLine(string.Join(",", seqs.Select(s =>
            string.Format("{{{0}}}",
                string.Join(",", s.Select(i => i.ToString()).ToArray())
            )).ToArray()));
    }
}
Leaf Garland
That suffers the same problem as the original though - you're still using a single enumerator. That relies on the caller iterating through the returned sequences at exactly the expected times, which isn't very robust.
Jon Skeet
A: 

You could exploit the fact that GroupBy will run through the items sequentially.

    public static IEnumerable<IEnumerable<T>> GetSequences<T>
      (this IEnumerable<T> source, Func<T, bool> predicate)
    {
        bool flag = false;
        int id = 0;
        return source.GroupBy(x =>
            {
                bool match = predicate(x);
                if (match != flag)
                {
                    id += 1;
                    flag = match;
                }
                return new { keep = match, id = id };
            })
            .Where(g => g.Key.keep)
            .Select(g => g.AsEnumerable());
    }

Here's a test method:

    public static void Test1()
    {
        List<int> myList = new List<int>() {2,1,42,0,9,6,5,3,8};

        foreach (var g in myList.GetSequences(i => i < 6))
        {
            Console.WriteLine("g");
            foreach (int i in g)
            {
                Console.WriteLine(i);
            }
        }
    }
David B