views:

989

answers:

4

I recently came across a question in the Entity Framework forum on msdn: http://social.msdn.microsoft.com/Forums/en-US/adodotnetentityframework/thread/bb72fae4-0709-48f2-8f85-31d0b6a85f68

The person who asked the question tried to do a relatively simple query, involving two tables, a grouping, order by, and an aggregation using Linq-to-Entities. A pretty straightforward Linq query, and straightforward to do in SQL as well - the kind of stuff people try to do every day.

However, when using Linq-to-Entities the outcome is a complex query with lots of unnecessary joins etc. I tried it and wasn't able to get Linq-to-Entities to generate a decent SQL query from it if using just pure Linq against the EF entities.

Having seen a fair share of monster queries from EF I thought maybe the OP (and me, and others) are doing something wrong. Maybe there is a better way to do this?

So here's my challenge: using the example from the EF forum and using just Linq-to-Entities against the two entities, is it possible to get EF to generate a SQL query without unnecessary joins and other complexities?

I'd like to see EF generate something a little bit closer to what Linq-to-SQL does for the same kind of queries, while still using Linq against a EF model.

Restrictions: use EFv1 .net 3.5 SP1 or EFv4 (beta 1 is part of the VS2010/.net4 beta available for download from Microsoft). No CSDL->SSDL mapping tricks, model 'definingqueries', stored procs, db-side functions, or views allowed. Just plain 1:1 mapping between the model and the db and a pure L2E query that does what the original thread on MSDN asked. An association must exist between the two entities (i.e. my "workaround #1" answer to the original thread is not a valid workaround)

Update: 500pt bounty added. Have fun.

Update: As mentioned above, a solution that uses EFv4 / .net 4 (β1 or later) is of course eligible for the bounty. If you're using .net 4 post β1, please include build number (e.g. 4.0.20605), the L2E query you used, and the SQL it generated and sent to the DB.

Update: This issue has been fixed in VS2010 / .net 4 beta 2. Although the generated SQL still has a couple of [relatively harmless] extra levels of nesting, it doesn't do any of the nutty stuff it used to. The final execution plan after SQL Server's optimizer has had a go at it is now as good as it can be. +++ for the dudes and dudettes responsible for the SQL generating part of EFv4...

+1  A: 

If I was that worried about the crazy SQL, I just wouldn't do any of the grouping in the database. I would first query all of the data I needed by finishing it off with a ToList() while using the Include function to load all the data in a single select.

Here's my final result:

var list = from o in _entities.orderT.Include("personT")
           .Where(p => p.personT.person_id == person_id && 
                       p.personT.created >= fromTime && 
                       p.personT.created <= toTime).ToList()
           group o by new { o.name, o.personT.created.Year, o.personT.created.Month, o.personT.created.Day } into g
           orderby g.Key.name
           select new { g.Key, count = g.Sum(x => x.price) };

This results in a much simpler select:

SELECT 
1 AS [C1], 
[Extent1].[order_id] AS [order_id], 
[Extent1].[name] AS [name], 
[Extent1].[created] AS [created], 
[Extent1].[price] AS [price], 
[Extent4].[person_id] AS [person_id], 
[Extent4].[first_name] AS [first_name], 
[Extent4].[last_name] AS [last_name], 
[Extent4].[created] AS [created1]
FROM    [dbo].[orderT] AS [Extent1]
LEFT OUTER JOIN [dbo].[personT] AS [Extent2] ON [Extent1].[person_id] = [Extent2].[person_id]
INNER JOIN [dbo].[personT] AS [Extent3] ON [Extent1].[person_id] = [Extent3].[person_id]
LEFT OUTER JOIN [dbo].[personT] AS [Extent4] ON [Extent1].[person_id] = [Extent4].[person_id]
WHERE ([Extent1].[person_id] = @p__linq__1) AND ([Extent2].[created] >= @p__linq__2) AND ([Extent3].[created] <= @p__linq__3)

Additionally, with the example data provided, SQL Profiler only notices a 3 ms increase in duration of the SQL call.

Personally, I think that anyone that whines about not liking the output SQL of an ORM layer should go back to using Stored Procedures and Datasets. They simply aren't ready to evolve yet, and need to spend a few more years in the proverbial oven. :)

Lusid
Still, it has 2 joins to the person table more than it should. The challenge is to get rid of those. This is a relatively simple example so the db-side impact is not as great as it might be for more complex examples but the important thing is to get rid of extra joins that make it more difficult for the SQL Server optimizer to pick the right execution plan...
KristoferA - Huagati.com
Do the extra joins make it more difficult for the SQL Server optimizer? Query analyzer should be able to measure the impact of removing the two joins.
Sean Reilly
On a very simple query like this, the difference between 1 join and 3 joins is probably miniscule from the optimizer's point of view but I'm hoping to find an answer to this question that can then be applied to more complex queries. In a more real-world-scenario I might have queries requiring data from 15 tables and I am hoping that there is some 'magic trick' that can be applied to the queries that keeps it to 15 tables even if the DAL is based on L2E+EF.
KristoferA - Huagati.com
Don't get me wrong. I completely agree with you. From what I can tell, there isn't any standard way of working around this. I've seen numerous reports about this as a bug in the MSDN forums, and the big boys keep saying they are adding optimizations in the next release. Hopefully, all of the XML hacking to do simple things will be unnecessary in the next release as well.As with any new technology, we will just have to wait it out.
Lusid
I think you're right, we just have to wait and see if they will fix it in 4.x... Anyway, the idea with the bounty was to encourage people to hunt for a workaround and I hope it had that effect. Since your reply shows some effort to find a workaround I'll select it as the answer - and winner of the bounty. Thanks for trying. :)
KristoferA - Huagati.com
Thanks! I honestly wish I could've found a workaround. I learned some things myself in the process, so it was worth the time. :D
Lusid
A: 

The SQL that linq generates is very efficient. It may look bulky but it takes into account relations on tables and constraints etc. In my opinion you should just blindly use the linq commands and not worry about scale. There are benefits of the large queries as its automatically generated. It avoids any slip ups in relational constraints and adds its own wrappers for faults/exceptions.

If however you want to write the SQL's yourself and still want to work behind the confines of an ORM, then try iBatishttp://ibatis.apache.org/ You have to write the SQL's and joins yourself, so it gives you complete control over the backend model.

Personally, just use SQLMetal and linq. Dont worry about performance and scale, unless you need to.

Ritesh M Nayak
You're absolutely correct that Linq-to-SQL generates efficient SQL. See my response to the original thread. However, Linq-to-Entities is a different story and that is why I raised this challenge and that is also why I added a bounty to it. I am interested in getting L2E to behave like L2S and I want workarounds that make it do that...
KristoferA - Huagati.com
A: 

Off-topic and frankly I don't care if I get downvoted for this... but why are you using LINQ or an OR/M for a reporting query?

GuyIncognito
It doesn't have to get down voted, but why even write that? It’s irrelevant. Even if the reasoning is is "on a whim" or "I felt like it", the question is technically sound. Don't you think?
Ian Patrick Hughes
Reporting query? Every app I have ever written makes use of aggregates and groupings; 'total amount to pay', 'prescriptions filled today', 'passengers by class on flight XX123', 'current portfolio market value' etc. I don't know if the original query is a reporting query but I am interested in being able to do these every-day things from within my apps...
KristoferA - Huagati.com
...and that without having to mix technologies. If I use an OR mapper then I want to use it fully out.
KristoferA - Huagati.com
Actually, EF is designed with reporting in mind (even if it is bizarre to mix transaction and reporting), which is one of the reasons why ESQL exists - you can use ESQL against an edmx model even without any actual runtime types/objects.
Marc Gravell
+2  A: 

Interesting discussion. I have used 2 ORM models so far (NHibernate and LINQ-to-Entities). In my experience, there is always a line where you have to give up on ORM to generated SQL and resort back to stored procedures or views to achieve best scalable queries. Having said that, I personally think that LINQ works better on more normalized databases and all the nested queries/joins are not a major issue. There are some cases where, in order to increase performance or scalability, you have to use DB server features (indexed views for example on SQL 2008 SE works only with query hints) and you simply cannot use an ORM (except iBatis?).

Granted that you won't get the best performance or scalability by using these nested joins/queries generated by linq but please don't forget the advantages and development benefits given by LINQ (or NHibernate) in any project. Surely there must be some merit to it.

Finally, although I risk comparing apple and oranges but isn't think more like asking: Do you want rapid website development (asp.net webforms, swing) or more control on your HTML (asp.net mvc, RoR)? pick the thing that best suits your requirements.

My 2 cents!