views:

35

answers:

2

I had a LINQ query that loads a hierarchy of objects like the following.

Query #1

var result = db.Orders
               .Include("Customer")
               // many other .Include() here
               .FirstOrDefault(x => x.Customer.CustomerId == 1 &&
                                    x.OrderId == orderId);

I was having MAJOR performance problem with it.
The CPU usage was near 100% and memory usage was very high.

And I tweaked it to the following and the performance problem was fixed.

Query #2

var result = db.Orders
               .Include("Customer")
               // many other .Include() here
               .Where(x => x.Customer.CustomerId == 1 &&
                           x.OrderId == orderId)
               .FirstOrDefault();



I just want to confirm my suspicion.
Query #1 is probably looping through all my records in memory looking for a matching record
vs
Query #2 filters the records on the Database and then getting the first record only.

Is that why the Query #1 has performance problems?

Just to be safe, do I need to use the .Select(x => x) before the .FirstOrDefault()?

Query #3

var result = db.Orders
               .Include("Customer")
               // many other .Include() here
               .Where(x => x.Customer.CustomerId == 1 &&
                           x.OrderId == orderId)
               .Select(x => x)
               .FirstOrDefault();
A: 

No, they both should result in a same SQL query when being executed. You can prove it by looking into SQL Profiler and see what is the exact SQL being submitted from EF in both cases. Your performance optimization should have been caused by some other factors. Here is why:

Include method returns an ObjectQuery<T>:

public class ObjectQuery<T> : ObjectQuery, IOrderedQueryable<T>, 
                              IQueryable<T>, IEnumerable<T>, 
                              IOrderedQueryable, IQueryable, 
                              IEnumerable, IListSource

Which means its FirstOrDefault method comes with 2 overloads:

// Defined by Enumerable:
FirstOrDefault(Func<T, Boolean>)

// Defined by Queryable:
FirstOrDefault(Expression<Func<T, Boolean>>)

When you code .FirstOrDefault(x => x.Customer.CustomerId == 1 compiler will go into a process called Overload Resolution to infer the type of the lambda expression x => x.Customer.CustomerId == 1 since it is convertible to the type of both overload's parameter types.
Compiler will use an algorithm (that I am still trying to find in C# Language Specification!), figure out that converting the lambda to the Expression<Func<T, Boolean> is a better conversion than to Func<T, Boolean> so pick the IQueryable overload.
Therefore, you'll see the predicate in the generated SQL when observing it in the SQL Profiler.

Morteza Manavi
Hi Morteza, you answered a similar question a bit differently but told him to include "using System.Linq.Expressions;". I don't have that using statement. Is that why I'm having a performance problem? http://stackoverflow.com/questions/3540410/ef-4-0-weird-behaviour-of-firstordefault-method/3540477#3540477
stun
Thanks for reminding that. I agree, it's different, but I am still insisting on my new answer here and I update my answer to show why. Please run them both and see the result in the Profiler, you'll see that they both result in the exactly same SQL and please let me know if it's not.
Morteza Manavi
A: 

I found the culprit. It's the SQL query generated by Entity Framework.
I have a complicated Schema with a lot of many-to-many relationships.

Entity Framework was generating a 32,000 line long SQL string :'(
Now, I am changing my code to load the hierarchy manually for some part.

Please let me know if anyone knows some good articles to read about Eager Loading and Many-to-Many relationships.

stun