views:

508

answers:

4

I just want know if a "FindAll" will be faster than a "Where" extentionMethod and why?

Example :

myList.FindAll(item=> item.category == 5);

or

myList.Where(item=> item.category == 5);

Which is better ?

+2  A: 

Well, at least you can try to measure it.

The static Where method is implemented using an iterator bloc (yield keyword), which basically means that the execution will be deferred. If you only compare the calls to theses two methods, the first one will be slower, since it immediately implies that the whole collection will be iterated.

But if you include the complete iteration of the results you get, things can be a bit different. I'm pretty sure the yield solution is slower, due to the generated state machine mechanism it implies. (see @Matthew anwser)

Romain Verdier
I not sure if the two solution have the same algoritme behind.
Cédric Boivin
You're right, FindAll is not implemented used an iterator bloc. So it may be a bit slower than the static Where method.
Romain Verdier
+20  A: 

Well, FindAll copies the matching elements to a new list, whereas Where just returns a lazily evaluated sequence - no copying is required.

I'd therefore expect Where to be slightly faster than FindAll even when the resulting sequence is fully evaluated - and of course the lazy evaluation strategy of Where means that if you only look at (say) the first match, it won't need to check the remainder of the list. (As Matthew points out, there's work in maintaining the state machine for Where. However, this will only have a fixed memory cost - whereas constructing a new list may require multiple array allocations etc.)

Basically, FindAll(predicate) is closer to Where(predicate).ToList() than to just Where(predicate).

Just to react a bit more to Matthew's answer, I don't think he's tested it quite thoroughly enough. His predicate happens to pick half the items. Here's a short but complete program which tests the same list but with three different predicates - one picks no items, one picks all the items, and one picks half of them. In each case I run the test fifty times to get longer timing.

I'm using Count() to make sure that the Where result is fully evaluated. The results show that collecting around half the results, the two are neck and neck. Collecting no results, FindAll wins. Collecting all the results, Where wins. I find this intriguing: all of the solutions become slower as more and more matches are found: FindAll has more copying to do, and Where has to return the matched values instead of just looping within the MoveNext() implementation. However, FindAll gets slower faster than Where does, so loses its early lead. Very interesting.

Results:

FindAll: All: 11994
Where: All: 8176
FindAll: Half: 6887
Where: Half: 6844
FindAll: None: 3253
Where: None: 4891

(Compiled with /o+ /debug- and run from the command line, .NET 3.5.)

Code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

class Test
{
    static List<int> ints = Enumerable.Range(0, 10000000).ToList();

    static void Main(string[] args)
    {
        Benchmark("All", i => i >= 0); // Match all
        Benchmark("Half", i => i % 2 == 0); // Match half
        Benchmark("None", i => i < 0); // Match none
    }

    static void Benchmark(string name, Predicate<int> predicate)
    {
        // We could just use new Func<int, bool>(predicate) but that
        // would create one delegate wrapping another.
        Func<int, bool> func = (Func<int, bool>) 
            Delegate.CreateDelegate(typeof(Func<int, bool>), predicate.Target,
                                    predicate.Method);
        Benchmark("FindAll: " + name, () => ints.FindAll(predicate));
        Benchmark("Where: " + name, () => ints.Where(func).Count());
    }

    static void Benchmark(string name, Action action)
    {
        GC.Collect();
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < 50; i++)
        {
            action();
        }
        sw.Stop();
        Console.WriteLine("{0}: {1}", name, sw.ElapsedMilliseconds);
    }
}
Jon Skeet
I am very not sure with your affirmation, because if i use a findAll and i modify my result collection, the initial collection are modify to, so could not be a copy
Cédric Boivin
if the element is value rather than reference, you will see the copy
Benny
How were you modifying the result collection? My guess is you were changing the data within one of the elements. The list just contains references (assuming T is a reference type). Modifying the contents of an object isn't really modifying the list itself. You'd see the same behaviour in any other way. However, the *list itself* returned by FindAll is independent of the original list - for example, if you add a new element to the original list *after* calling FindAll, that doesn't show up in the result list.
Jon Skeet
See http://pobox.com/~skeet/csharp/references.html for more information.
Jon Skeet
@Cédric: are you sure that it is the *collection* that is modified, and not *the objects in the collection*? Since the returned list will contain references to the same objects (but just some of them) as the original list, it's the same object instances you are working with.
Fredrik Mörk
Yes i make a mistake in my comment, the object in the collection is the same, but the collection is a new collection, i understand. Sory jon
Cédric Boivin
The where is going to have to process the Enumerator state machine. that will have much more over head then FindAll
Matthew Whited
@Matthew: It depends on exactly what you're doing. The state machine isn't hugely complicated, and you won't need to do any copying - including not doing any array resizing etc. Also note my point that if you only need (say) the first three elements, you're wasting a lot of work with FindAll by creating the full list.
Jon Skeet
Understood but that is not a fair comparison. You could also just extend the predicate to include the logic for the early exit.
Matthew Whited
@Matthew: And that's exactly why I said that Where and FindAll aren't directly comparable as part of my answer.
Jon Skeet
Thank you for the nice benchmark method.
Yuriy Faktorovich
+1  A: 

I can give some clue, but not sure which one faster. FindAll() is executed right away. Where() is defferred executed.

Benny
+10  A: 

How about we test instead of guess? Shame to see the wrong answer get out.

var ints = Enumerable.Range(0, 10000000).ToList();
var sw1 = Stopwatch.StartNew();
var findall = ints.FindAll(i => i % 2 == 0);
sw1.Stop();

var sw2 = Stopwatch.StartNew();
var where = ints.Where(i => i % 2 == 0).ToList();
sw2.Stop();

Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
/*
Debug
sw1: 1149856
sw2: 1652284

Release
sw1: 532194
sw2: 1016524
*/

Edit:

Even if I turn the above code from

var findall = ints.FindAll(i => i % 2 == 0);
...
var where = ints.Where(i => i % 2 == 0).ToList();

... to ...

var findall = ints.FindAll(i => i % 2 == 0).Count;
...
var where = ints.Where(i => i % 2 == 0).Count();

I get these results

/*
Debug
sw1: 1250409
sw2: 1267016

Release
sw1: 539536
sw2: 600361
*/

Edit 2.0...

If you want a list of the subset of the current list the fastest method if the FindAll(). The reason for this is simple. The FindAll instance method uses the indexer on the current List instead of the enumerator state machine. The Where() extension method is an external call to a different class that uses the enumerator. If you step from each node in the list to the next node you will have to call the MoveNext() method under the covers. As you can see from the above examples it is even faster to use the index entries to create a new list (that is pointing to the original items, so memory bloat will be minimal) to even just get a count of the filtered items.

Now if you are going to early abort from the Enumerator the Where() method could be faster. Of course if you move the early abort logic to the predicate of the FindAll() method you will again be using the indexer instead of the enumerator.

Now there are other reasons to use the Where() statement (such as the other linq methods, foreach blocks and many more) but the question was is the FindAll() faster than Where(). And unless you don't execute the Where() the answer seems to be yes. (When comparing apples to apples)

I am not say don't use LINQ or the .Where() method. They make for code that is much simpler to read. The question was about performance and not about how easy you can read and understand the code. By fast the fastest way to do this work would be to use a for block stepping each index and doing any logic as you want (even early exits). The reason LINQ is so great is becasue of the complex expression trees and transformation you can get with them. But using the iterator from the .Where() method has to go though tons of code to find it's way to a in memory statemachine that is just getting the next index out of the List. It should also be noted that this .FindAll() method is only useful on objects that implmented it (such as Array and List.)

Yet more...

for (int x = 0; x < 20; x++)
{
    var ints = Enumerable.Range(0, 10000000).ToList();
    var sw1 = Stopwatch.StartNew();
    var findall = ints.FindAll(i => i % 2 == 0).Count;
    sw1.Stop();

    var sw2 = Stopwatch.StartNew();
    var where = ints.AsEnumerable().Where(i => i % 2 == 0).Count();
    sw2.Stop();

    var sw4 = Stopwatch.StartNew();
    var cntForeach = 0;
    foreach (var item in ints)
        if (item % 2 == 0)
            cntForeach++; 
    sw4.Stop();

    Console.WriteLine("sw1: {0}", sw1.ElapsedTicks);
    Console.WriteLine("sw2: {0}", sw2.ElapsedTicks);
    Console.WriteLine("sw4: {0}", sw4.ElapsedTicks);
}


/* averaged results
sw1 575446.8
sw2 605954.05
sw3 394506.4
/*
Matthew Whited
+1, but do note that if you don't need to do a ToList after Where, it will probably be faster, and will take less memory.
Meta-Knight
that may be true. but you have to do something with the where. And if you to .ToArray the results are the same. If you look at the code behind the two methods you can see which does more work.
Matthew Whited
If you dont cast in tolist what will be your results ?
Cédric Boivin
If you don't do anything with the where it will run really fast. But then again you don't do anything so what do you expect?
Matthew Whited
The where return a Ienumerable<T> so you can do someting with de return IEnumerable
Cédric Boivin
yeah you can pass it to a foreach block or you can run another extension method on it. You could even write your own extension methods that work with IEnumerable<T>. This allows you to build up expresion trees to build the queries behind things such as LINQ-2-SQL. If all it is really doing is building a delegate until you do something with it.
Matthew Whited
If you want a list the FindAll is your best bet. But if you are trying to do something more complex and need to worry about the memory then Where will probably be better. That is the fun of finding the balance between what you want to do and how you can do it.
Matthew Whited
Just iterating through it would force everything to be evaluated. Calling ToList will give different performance as that really *is* forcing it to do everything that FindAll does. It also doesn't give you the option of only iterating half way through the results.
Jon Skeet
That is why I also compared the results of .Count()
Matthew Whited
BTW. .ToArray() performs the same as .ToList(). And if you put this in a foreach or use the Enumerator in a some other block it will have more over head. Unless you do an early about as you said. But that wasn't the question.
Matthew Whited
Yup, Count provides a much better comparison than ToList/ToArray.
Jon Skeet
-1 You should explain WHY as Jon does
Jan Bannister
I believe your test is flawed - you haven't explored enough possibilities. See my edited answer for different scenarios.
Jon Skeet
If we are going for full out optimized I can beat both by doing the work inline for statements but the question was to compare FindAll to Where.
Matthew Whited
Yes - FindAll vs Where, not FindAll vs Where.ToList.
Jon Skeet