views:

54

answers:

3

I have a project that was "spiked" using Linq2SQL and is now running into some major query performance issues. Go figure.

Linq actually works pretty well in simple query and command scenarios, but there are a couple filter intense queries that need to be rewritten as Sprocs.

I'm wondering if someone can give me some high level pointers to save time on optimizing this monster query that is generated by Linq.

Off the top of my head, I think that replacing all the "In(@p1, @p2)" where clauses with Inner Joins to temp tables would be a good start.

All foreign key and where clause columns are indexed.

Any insights are appreciated.

Here's the code:

SELECT [t9].[ID], [t9].[Description], [t9].[AreaCodeID], [t9].[BedroomCodeID], [t9].[BathroomCodeID], [t9].[DwellingCodeID], [t9].[LandlordID], [t9].[ParsedItemID], [t9].[DeletedReasonID], [t9].[CoordinateID], [t9].[Address], [t9].[PhonePrefix1], [t9].[Phone1], [t9].[PhonePrefix2], [t9].[Phone2], [t9].[EmailAddress], [t9].[RentAmount], [t9].[SquareFeet], [t9].[DateAvailable], [t9].[DateCreated], [t9].[IsDeleted], [t9].[RowVersion], [t9].[ID2], [t9].[ParentAreaCodeID], [t9].[AreaGroupID], [t9].[Description2], [t9].[Order], [t9].[IsTopLevelArea], [t9].[IsDeleted2], [t9].[ID3], [t9].[CityID], [t9].[Description3], [t9].[Order2], [t9].[IsPrimary], [t9].[IsDeleted3], [t9].[ID4], [t9].[HostURL], [t9].[Description4], [t9].[Rate], [t9].[RateTax], [t9].[RateTaxCode], [t9].[Currency], [t9].[LogoImageFileName], [t9].[FlashQuotesFileName], [t9].[TestimonialQuotesFileName], [t9].[GoogleAnalyticsTrackingCode], [t9].[GoogleMapsAPIKey], [t9].[IDHash], [t9].[test], [t9].[ID5], [t9].[Description5], [t9].[ID6], [t9].[Description6], [t9].[Order3], [t9].[IsDeleted4], [t9].[ID7], [t9].[Description7], [t9].[IsDeleted5], [t9].[Order4]
FROM (
    SELECT TOP (100) [t0].[ID], [t0].[Description], [t0].[AreaCodeID], [t0].[BedroomCodeID], [t0].[BathroomCodeID], [t0].[DwellingCodeID], [t0].[LandlordID], [t0].[ParsedItemID], [t0].[DeletedReasonID], [t0].[CoordinateID], [t0].[Address], [t0].[PhonePrefix1], [t0].[Phone1], [t0].[PhonePrefix2], [t0].[Phone2], [t0].[EmailAddress], [t0].[RentAmount], [t0].[SquareFeet], [t0].[DateAvailable], [t0].[DateCreated], [t0].[IsDeleted], [t0].[RowVersion], [t1].[ID] AS [ID2], [t1].[ParentAreaCodeID], [t1].[AreaGroupID], [t1].[Description] AS [Description2], [t1].[Order], [t1].[IsTopLevelArea], [t1].[IsDeleted] AS [IsDeleted2], [t2].[ID] AS [ID3], [t2].[CityID], [t2].[Description] AS [Description3], [t2].[Order] AS [Order2], [t2].[IsPrimary], [t2].[IsDeleted] AS [IsDeleted3], [t3].[ID] AS [ID4], [t3].[HostURL], [t3].[Description] AS [Description4], [t3].[Rate], [t3].[RateTax], [t3].[RateTaxCode], [t3].[Currency], [t3].[LogoImageFileName], [t3].[FlashQuotesFileName], [t3].[TestimonialQuotesFileName], [t3].[GoogleAnalyticsTrackingCode], [t3].[GoogleMapsAPIKey], [t3].[IDHash], [t5].[test], [t5].[ID] AS [ID5], [t5].[Description] AS [Description5], [t6].[ID] AS [ID6], [t6].[Description] AS [Description6], [t6].[Order] AS [Order3], [t6].[IsDeleted] AS [IsDeleted4], [t7].[ID] AS [ID7], [t7].[Description] AS [Description7], [t7].[IsDeleted] AS [IsDeleted5], [t7].[Order] AS [Order4]
    FROM [dbo].[Listing] AS [t0]
    INNER JOIN ([dbo].[AreaCode] AS [t1]
        INNER JOIN ([dbo].[AreaGroup] AS [t2]
            INNER JOIN [dbo].[City] AS [t3] ON [t3].[ID] = [t2].[CityID]) ON [t2].[ID] = [t1].[AreaGroupID]) ON [t1].[ID] = [t0].[AreaCodeID]
    LEFT OUTER JOIN (
        SELECT 1 AS [test], [t4].[ID], [t4].[Description]
        FROM [dbo].[BathroomCode] AS [t4]
        ) AS [t5] ON [t5].[ID] = [t0].[BathroomCodeID]
    INNER JOIN [dbo].[BedroomCode] AS [t6] ON [t6].[ID] = [t0].[BedroomCodeID]
    INNER JOIN [dbo].[DwellingCode] AS [t7] ON [t7].[ID] = [t0].[DwellingCodeID]
    WHERE (NOT ([t0].[IsDeleted] = 1)) AND (EXISTS(
        SELECT NULL AS [EMPTY]
        FROM [dbo].[ListingMiscellaneousCode] AS [t8]
        WHERE ([t8].[MiscellaneousCodeID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6)) AND ([t8].[ListingID] = [t0].[ID])
        )) AND ([t0].[DwellingCodeID] IN (@p7)) AND ([t0].[AreaCodeID] IN (@p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100, @p101, @p102, @p103, @p104, @p105, @p106, @p107, @p108, @p109, @p110, @p111))
    ) AS [t9]
ORDER BY [t9].[DateCreated] DESC, [t9].[RentAmount], [t9].[Description2]

As you might have been able to guess, the problem section is entirely located in the Where clause. Removal of this results in the query going very quickly.

Even with this Where Clause, it's not that slow, (about 1 sec), but the problem is, that I'm also having to return various counts of the current data, based on similar style queries. The whole process is taking in excess of 5 seconds due to the multiple queries with poor Where Clauses.

Another thing I don't understand, is that changing the Page size of the query, ie "...Select TOP (100)..." to a higher number like "...Select TOP (5000)..." doesn't slow the query down AT ALL. This is strange to me, and is more evidence I think that the problem is hopefully fixable with modified sql.

You'll also notice that one Where clause in particular (for areacodeid) is querying nearly 100 parameters. This is by design. Now I can put in a hack in the parent table to reduce this at the expense of some de-normalization, but I'm hoping that there is a pure sql fix first that will let me efficiently join to a temp table with 100s of params.

Thanks for your help.

A: 

There's nothing in there that looks "bad" per-se. Looking at it, it seems scary but after doing some reformatting and removing the extraneous brackets, it's not that bad. I've never liked nested JOINs, and I'd look at cleaning that up, but that's a personal preference: I don't think it'll do anything for performance.

So ... if taking out the WHERE clause produces a speed up, I'd be looking at indexes and implicit conversions. The first is self-explanatory; the second I've been burnt by before. Both can be detected by analysing the execution plan.

Effectively, an implicit conversion is bad when SQL Server converts the data in the database column rather than converting the parameter that is being compared with a database column. This can happen when a VARCHAR database column is being compared to an NVARCHAR paramater: SQL Server can't do a straight comparison because VARCHAR != NVARCHAR, so it promotes the VARCHAR data in the table column to NVARCHAR before doing a comparison. The result is a full index scan rather than an index seek which can slaughter performance for large tables.

I'd look at the execution plan, see if you have any index seeks and if you do, look to see if there's any implicit conversions of database columns happening behind them.

Chris J
+1  A: 

Are there indexes on any of the useful columns in the WHERE clause (ListingMiscellaneousCode, MiscellaneousCodeID, DwellingCodeID, AreaCodeID? Have you considered passing a single string for the parameter list, instead of having 100+ individual parameters? Typically you want the opposite, but in this case I think it may be justified. First I would create a numbers table, 500 rows is probably sufficient:

SET NOCOUNT ON;
DECLARE @UpperLimit INT;
SET @UpperLimit = 500;

    WITH n AS
    (
        SELECT
            x = ROW_NUMBER() OVER
            (ORDER BY s1.[object_id])
        FROM       [master].sys.columns AS s1
        CROSS JOIN [master].sys.columns AS s2
    )
    SELECT [Number] = x
      INTO dbo.Numbers
      FROM n
      WHERE x BETWEEN 1 AND @UpperLimit;
    GO
    CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number]);
    GO

Now create a function that can parse a list of strings:

CREATE FUNCTION dbo.SplitINTs
(
    @List       VARCHAR(MAX),
    @Delimiter  NVARCHAR(10)
)
RETURNS TABLE
AS
    RETURN
    (
        SELECT DISTINCT
            [Value] = CONVERT(INT, LTRIM(RTRIM(
                SUBSTRING(@List, [Number],
                CHARINDEX
                (
                  @Delimiter, @List + @Delimiter, [Number]
                ) - [Number]))))
        FROM
            dbo.Numbers
        WHERE
            Number <= LEN(@List)
            AND SUBSTRING
            (
              @Delimiter + @List, [Number], LEN(@Delimiter)
            ) = @Delimiter
    );
GO

Now your query can say:

DECLARE @MiscCodeIDs VARCHAR(MAX), @AreaCodeIDs VARCHAR(MAX);
SELECT @MiscCodeIDs = '1,2,3,4,5...', @AreaCodeIDs = '6,7,8,9,10...';

SELECT <obnoxiously large column list>
FROM
...
INNER JOIN dbo.AreaCodes AS t1
ON ...
INNER JOIN dbo.SplitInts(@AreaCodeIDs, N',') AS acs
ON t1.AreaCodeID = acs.[Value]
...

AND 
(
    EXISTS
    (
              SELECT 1
         FROM [dbo].[ListingMiscellaneousCode] AS [t8]
      INNER JOIN dbo.SplitInts(@MiscCodeIDs, N',') AS m
      ON m.[Value] = t8.MiscellaneousCodeID
      AND ([t8].[ListingID] = [t0].[ID])
    )
)
...

I'm assuming these IDs are INTs. If they are strings, just take out the CONVERT(INT) in the function (and you may want to use NVARCHAR in case you need to support Unicode).

Aaron Bertrand
A: 

First, I think you have a bug in there... the SELECT TOP 100 will pull back a random 100 and then the ORDER BY [t9].[DateCreated] DESC will sort them. This doesn't give you the last 100 created.

You can't actually need to return 59 columns? Limit this.

I think

([t0].[AreaCodeID] IN (@...

should be

[t1].[ID] IN (@...

And there should be a unique index on [dbo].[AreaCode].ID

Given that indexes perform better with a BETWEEN ran rather than all of the values spelled out I would also see if I could collapse the 100 values to something more like: [t1].[ID] BETWEEN @p1 and @p2 and [t1].[ID] in (@p3.... That might be some coding on your part.

But I would really look at where the 100 area codes came from.... you have the concept of AreaCodeGroup, but it doesn't look like it is being used.

JBrooks
Hi JBrooks,You're right about the bug. Thanks for that.I just included the full linq query for ease. Even selecting ONLY the ID of the first table results in a long query due to the filters. I've provided the 3 things that ultimately fixed the performance below.Thanks.
Scott