views:

165

answers:

6

I know it probably doesnt matter/affect performance for the most part but I hate the idea of getting an IEnumerable and doing .Count(). Is there a IsEmpty or NotEmpty or some function? (similar to stl empty())

+3  A: 

Without any need of LINQ, you can do following:

bool IsEmpty(IEnumerable en)
{
    foreach(var c in en) { return false; }
    return true;
}
Yossarian
Even faster would be return en.GetEnumerator().MoveNext(), which is basically what Any()'s implementation does. You're pretty much doing the same with the foreach, but you're incurring additional costs of assigning the first value to a local var, which you will never use.
KeithS
@KeithS: that's not very safe: an IEnumerable can be IDisposable as well (for several enumerables this is very relevant, for instance when you have a try-finally around your yield statements). foreach takes care you dispose properly. BTW, the method above is pretty much what `Any()` does.
Ruben
ok then, using(iter = en.GetIterator()) return iter.MoveNext(); You're still not assigning the current value like you would with a foreach.
KeithS
+1  A: 

On IEnumerable or IEnumerable<T>, no.

But it really doesn't make much sense. If a collection is empty and you try to iterate over it using IEnumerable, the call to IEnumerator.MoveNext() will simply return false at no performance cost.

Justin Niessner
"... at no performance cost" - if it's not a collection, but instead a lazy-evaluated enumeration, there may be a significant cost.
Joe
@Joe - But you're going to incur the same cost calling IEnumerable.Any() or IEnumerable.Count() as well.
Justin Niessner
@Justin. I agree. Hence in some cases it's worth considering instantiating a list so that this cost is only incurred once. See my answer.
Joe
+19  A: 

You want IEnumerable.Any() extension method (.Net Framework 3.5 and above). It avoids counting over the elements.

cristobalito
To elaborate it counts at most 1 element and is very efficient.
Bear Monkey
"...and is very efficient" - it's efficiency depends on the implementation. If it's a lazy-evaluated enumeration (e.g. a method that uses yield), then even starting the enumeration can be costly.
Joe
Costly doesnt mean its not efficient.
Bear Monkey
If it's a lazy-evaluated enumerable, then counting it will be even more costly. In all cases, Any() will produce the answer more quickly than Count() because it succeeds (and stops processing) on the first successful result.
KeithS
@KeithS. I completely agree. But the point is that the cost of even starting the enumeration may be significant, and may even be much larger than the cost of actually doing the iteration. Any is better than Count, but you may need to consider whether instantiating a collection is more appropriate - see my answer.
Joe
@cristobalito: Will an extension method on an interface always be used, even if the class implementing the interface offers a method of the same name, or is there some way for a class that implements iEnumerable(of T) indicate that the class has its own definition for "Any"?
supercat
@supercat: not sure about that. No doubt the standard defines the precedence in such cases. Quickest way it find out would be to knock up a small sample app.
cristobalito
A: 

I don't think so, that's what Count is for. Besides, what will be faster:

  1. Accessing a Property and retrieving a stored Integer
  2. Accessing a Property and retrieving a stored Boolean
Bobby
Count on IEnumerable can be quite expensive (for example, enumerables with lazy evaluation). My solution is, I think, better.
Yossarian
In a List or similar object where Count is readily available, it will be cheaper. But in a general IEnumerable implementation, the naive implementation for Count() would be to iterate over the elements, which is potentially a lot of overhead for what the OP is asking for.
cristobalito
@Yossarian: I didn't know that there is a `Count()`-Function, every Collection I find uses a Property which returns a private variable.
Bobby
@Bobby, not only collections can inherit `IEnumerable`. Look for `yield return` keyword, if you don't know what I'm talking about.
Yossarian
+1  A: 

You can use extension methods such as Any() or Count(). Count() is more costly than Any(), since it must execute the whole enumeration, as others have pointed out.

But in the case of lazy evaluation (e.g. a method that uses yield), either can be costly. For example, with the following IEnumerable implementation, each call to Any or Count will incur the cost of a new roundtrip to the database:

IEnumerable<MyObject> GetMyObjects(...)
{
    using(IDbConnection connection = ...)
    {
         using(IDataReader reader = ...)
         {
             while(reader.Read())
             {
                 yield return GetMyObjectFromReader(reader);
             }
         }
    }
}

I think the moral is:

  • If you only have an IEnumerable<T>, and you want to do more than just enumerate it (e.g. use Count or Any), then consider first converting it to a List (extension method ToList). In this way you guarantee to only enumerate once.

  • If you are designing an API that returns a collection, consider returning ICollection<T> (or even IList<T>) rather than IEnumerable<T> as many people seem to recommend. By doing so you are strengthening your contract to guarantee no lazy evaluation (and therefore no multiple evaluation).

Please note I am saying you should consider returning a collection, not always return a collection. As always there are trade-offs, as can be seen from the comments below.

  • @KeithS thinks you should never yield on a DataReader, and while I never say never, I'd say it's generally sound advice that a Data Access Layer should return an ICollection<T> rather than a lazy-evaluated IEnumerable<T>, for the reasons KeithS gives in his comment.

  • @Bear Monkey notes that instantiating a List could be expensive in the above example if the database returns a large number of records. That's true too, and in some (probably rare) cases it may be appropriate to ignore @KeithS's advice and return a lazy-evaluated enumeration, provided the consumer is doing something that is not too time-consuming (e.g. generating some aggregate values).

Joe
+ 1 good point.
acidzombie24
This is a bad example. You should NEVER yield on a DataReader, for reasons other than performance (it keeps locks in place longer, turns a "firehose" data stream into a trickle, etc). However, slurping all the data into a List, then yielding through it, would perform similarly and still illustrate the point.
KeithS
ToListing before returning could be even more costly than repeated instantiation costs if, for instance, the database return millions of records. Id rather return IEnumerable<T> and let the consumer decide if they want to cache the results with ToList. Also what Keith said.
Bear Monkey
@KeithS I agree with your sentiment, and would not expose such a method in a public API for the reasons you mention and others. But the point here is simply to provide an artificial example of an extreme case where starting the enumeration may take much longer than actually enumerating.
Joe
@Bear Monkey. "ToListing ... could be ... more costly ... if... the database return millions of records". I absolutely agree. Which is why I said you should "consider" using ToList rather than "always" use ToList.
Joe
+1  A: 

Keep in mind that IEnumerable is just an interface. The implementation behind it can be very different from class to class (consider Joe's example). The extension method IEnumerable.Any() has to be a generic approach and may not be what you want (performance wise). Yossarian suggests a means that should work for many classes, but if the underlying implementation does not use 'yield' you could still pay a price.

Generally, if you stick to collections or arrays wrapped in an IEnumerable interface, then Cristobalito and Yossarian probably have the best answers. My guess is the built-in .Any() ext method does what Yossarian recommends.

Les