views:

82

answers:

2

I have a parent-child table relationship. In a repository, I'm doing this:

return (from p in _ctx.Parents  
.Include( "Children" )  
select p).AsQueryable<Parent>();  

Then in a filter, I want to filter the parent by a list of child ids:

IQueryable<Parent> qry;  // from above
List<int> ids;  // huge list (8500)
var filtered =
from p in qry.Where( p => p.Children.Any(c => ids.Contains(c.ChildId)) ) select s;  

My list of ids is huge. This generates a simple SQL statement that does have a huge list of ids "in (1,2,3...)", but it takes no appreciable time to run by itself. EF, however, takes about a full minute just to generate the statement. I proved this by setting a breakpoint and calling:

((ObjectQuery<Parent>)filtered).ToTraceString();

This takes all the time. Is the problem in my last linq statement? I don't know any other way to do the equivalent of Child.ChildId in (ids). And even if my linq statement is bad, how in the world should this take so long?

+2  A: 

Re-write your query in Lambda syntax and it will cut the time by as much as 3 seconds (or at least it did for my EF project).

return _ctx.Parents.Include( "Children" ).AsQueryable<Parent>();  

and

IQueryable<Parent> qry;  // from above
List<int> ids;  // huge list (8500)
var filtered = qry.Where( p => p.Children.Any(c => ids.Contains(c.ChildId)) );
Nate Zaugg
`Select(s => s)` doesn't do anything. End it after the Where.
StriplingWarrior
Right...Thanks!
Nate Zaugg
This made no difference in my case.
dudeNumber4
Are you using EF4 on the .NET 4 framework? I only saw the problem on the first release of EF.
Nate Zaugg
+4  A: 

Unfortunately, building queries in Linq to Entities is a pretty heavy hit, but I've found it usually saves time due to the ability to build queries from their component pieces before actually hitting the database.

It is likely that the way they implement the Contains method uses an algorithm that assumes that Contains is generally used for a relatively small set of data. According to my tests, the amount of time it takes per ID in the list begins to skyrocket at around 8000.

So it might help to break your query into pieces. Group them into groups of 1000 or less, and concatenate a bunch of Where expressions.

var idGroups = ids.GroupBy(i => i / 1000);
var q = Parents.Include("Children").AsQueryable();
var newQ = idGroups.Aggregate(q, 
    (s, g) => s.Concat(
                  q.Where(w => w.Children.Any(wi => g.Contains(wi.ChildId)))));

This speeds things up significantly, but it might not be significantly enough for your purposes, in which case you'll have to resort to a stored procedure. Unfortunately, this particular use case just doesn't fit into "the box" of expected Entity Framework behavior. If your list of ids could begin as a query from the same Entity Context, Entity Framework would have worked fine.

StriplingWarrior
Upvoted, but didn't mark as answer. Using stored procs is the answer, but there's no way I'm going to use the broken query solution. I'll just use the broken EF solution...
dudeNumber4
I would encourage you to either start a bounty or mark an answer. Even if you write your own stored-proc based answer to your question, the FAQ clearly states that this is acceptable. Of course, I would also point out that I not only answered the question as asked, but also provided the solution that it sounds like you're planning to use.
StriplingWarrior
This is likely the best answer you're going to get. Short of someone coming out with a fix for the framework, this is the answer.
Nate Zaugg