views:

149

answers:

3

Hello, guys!

In my project I'm using EntityFramework 4 for working with data. I found horrible performance problems with a simple query. When I looked at the profiler on a sql query, generated by EF4, I was shocked.

I have some tables in my entity data model:

Data Model

It looks pretty simple. I'm trying to select all product items from specified category with all related navigation properties.

I wrote this LINQ query:

ObjectSet<ProductItem> objectSet = ...; 
int categoryId = ...; 

var res = from pi in objectSet.Include("Product").Include("Inventory").Include("Inventory.Storage") 
where pi.Product.CategoryId == categoryId 
select pi;

EF generated this sql query:

SELECT   [Project1].[pintId1]          AS [pintId], 
[Project1].[pintId]           AS [pintId1], 
[Project1].[intProductId]     AS [intProductId], 
[Project1].[nvcSupplier]      AS [nvcSupplier], 
[Project1].[ nvcArticle]      AS [ nvcArticle], 
[Project1].[nvcBarcode]       AS [nvcBarcode], 
[Project1].[bIsActive]        AS [bIsActive], 
[Project1].[dtDeleted]        AS [dtDeleted], 
[Project1].[pintId2]          AS [pintId2], 
[Project1].[nvcName]          AS [nvcName], 
[Project1].[intCategoryId]    AS [intCategoryId], 
[Project1].[ncProductType]    AS [ncProductType], 
[Project1].[C1]               AS [C1], 
[Project1].[pintId3]          AS [pintId3], 
[Project1].[intProductItemId] AS [intProductItemId], 
[Project1].[intStorageId]     AS [intStorageId], 
[Project1].[dAmount]          AS [dAmount], 
[Project1].[mPrice]           AS [mPrice], 
[Project1].[dtModified]       AS [dtModified], 
[Project1].[pintId4]          AS [pintId4], 
[Project1].[nvcName1]         AS [nvcName1], 
[Project1].[bIsDefault]       AS [bIsDefault] 
FROM     (SELECT [Extent1].[pintId]         AS [pintId], 
[Extent1].[intProductId]   AS [intProductId], 
[Extent1].[nvcSupplier]    AS [nvcSupplier], 
[Extent1].[ nvcArticle]    AS [ nvcArticle], 
[Extent1].[nvcBarcode]     AS [nvcBarcode], 
[Extent1].[bIsActive]      AS [bIsActive], 
[Extent1].[dtDeleted]      AS [dtDeleted], 
[Extent2].[pintId]         AS [pintId1], 
[Extent3].[pintId]         AS [pintId2], 
[Extent3].[nvcName]        AS [nvcName], 
[Extent3].[intCategoryId]  AS [intCategoryId], 
[Extent3].[ncProductType]  AS [ncProductType], 
[Join3].[pintId1]          AS [pintId3], 
[Join3].[intProductItemId] AS [intProductItemId], 
[Join3].[intStorageId]     AS [intStorageId], 
[Join3].[dAmount]          AS [dAmount], 
[Join3].[mPrice]           AS [mPrice], 
[Join3].[dtModified]       AS [dtModified], 
[Join3].[pintId2]          AS [pintId4], 
[Join3].[nvcName]          AS [nvcName1], 
[Join3].[bIsDefault]       AS [bIsDefault], 
CASE 
WHEN ([Join3].[pintId1] IS NULL) THEN CAST(NULL AS int) 
ELSE 1 
END AS [C1] 
FROM   [ProductItem] AS [Extent1] 
INNER JOIN [Product] AS [Extent2] 
ON [Extent1].[intProductId] = [Extent2].[pintId] 
LEFT OUTER JOIN [Product] AS [Extent3] 
ON [Extent1].[intProductId] = [Extent3].[pintId] 
LEFT OUTER JOIN (SELECT [Extent4].[pintId]           AS [pintId1], 
[Extent4].[intProductItemId] AS [intProductItemId], 
[Extent4].[intStorageId]     AS [intStorageId], 
[Extent4].[dAmount]          AS [dAmount], 
[Extent4].[mPrice]           AS [mPrice], 
[Extent4].[dtModified]       AS [dtModified], 
[Extent5].[pintId]           AS [pintId2], 
[Extent5].[nvcName]          AS [nvcName], 
[Extent5].[bIsDefault]       AS [bIsDefault] 
FROM   [Inventory] AS [Extent4] 
INNER JOIN [Storage] AS [Extent5] 
ON [Extent4].[intStorageId] = [Extent5].[pintId]) AS [Join3] 
ON [Extent1].[pintId] = [Join3].[intProductItemId] 
WHERE  [Extent2].[intCategoryId] = 8 /* @p__linq__0 */) AS [Project1] 
ORDER BY [Project1].[pintId1] ASC, 
[Project1].[pintId] ASC, 
[Project1].[pintId2] ASC, 
[Project1].[C1] ASC

For 7000 records in database and ~1000 record in specified category this query's execution time id around 10 seconds. It is not surprising if look at this:

FROM [ProductItem] AS [Extent1]
INNER JOIN [Product] AS [Extent2]
ON [Extent1].[intProductId] = [Extent2].[pintId]
LEFT OUTER JOIN [Product] AS [Extent3]
ON [Extent1].[intProductId] = [Extent3].[pintId]
***LEFT OUTER JOIN (SELECT ....***

Nested select in join... Horrible... I tried to change LINQ query, but I get same SQL query outputted.

A solution using stored procedures is not acceptable for me, because I'm using SQL Compact database.

PS: Sory for bad english...

+6  A: 

You are doing Include("Product").Include("Inventory").Include("Inventory.Storage") and you are wondering why so many records are fetched and why so see such a big SQL query? Please make sure you understand what the Include method is about. If you want a simpler query, please use the following:

var res =
    from pi in objectSet
    where pi.Product.CategoryId == categoryId 
    select pi;

Please note however that this will possible load Products, Inventories and Storages lazily, which could cause many more queries to be sent when you iterate over those sub collections.

Steven
+1 good point - with the Product:ProductItem (1:*) and ProductItem:Inventory (1:*), a single Product will load a whole lots of extra (possibly unneeded) data.... no wonder it's slow....
marc_s
A: 

I'm thinking the problem is with the Inventory collection in the Storage element. Your query will limit Product, ProductItem and Inventory items selected to those for the CategoryId specified. However, in order to fill the Inventory collection of the Storage element, the query also has to return all Inventory rows that use the same StorageId's (and then all of the corresponding ProductItem and Product rows for those additional Inventory records.

I'd start by removing the Inventory collection from the Storage element or remove the corresponding include.

Jeff Siver
A: 

Hi again!

Yes, problem is in Storage entity. If i remove it from query, generated SQL is acceptable. But I need that all this properties including Storage property to be filled from start because all this data is intended to be sent to client side and displayed as a table.

I would like to see in generated SQL something like this:

SELECT *
FROM ProductItem 
INNER JOIN Product ON [Product].[pintId] = [ProductItem].[intProductId]
INNER JOIN Inventory ON [ProductItem].[pintId] = [Inventory].[intProductItemId]
INNER JOIN Storage ON [Storage].[pintId] = [Inventory].[intStorageId]
WHERE [Product].[intCategoryId] = @param

but NOT JOIN (SELECT...) instead... This is a bottleneck and I have no idea how to avoid it yet...

Such query executed in several milliseconds and generated query executed in about 10-12 seconds.

Ivan
Have you removed the Inventory collection from the Storage entity as I mentioned in my answer? I think it is the work to populate that collection that is causing the problems.
Jeff Siver
Yes, I tried to remove Inventory navigation property from Storage entity but result is exactly the same. Nested SELECT statement in JOIN :(
Ivan