views:

543

answers:

3

Right, this is an odd one.

We have a web service that returns data to a silverlight client. The queries are generated against a SQL Server 2008 database using the entity framework. Most of these queries are date range based - pull results between this date and that, for example. Also, a view is used to make things a little easier.

We have noticed that when issuing a specific set of queries that start on or after a certain date, execution will be very slow. This date is the 5th November, any year. If our start date is one day earlier execution will be instant. So, 5th - 6th, slow. 4th - 6th - quick.

Here is the SQL that EF generates:

THIS QUERY WILL BE VERY SLOW (30 seconds)

SELECT 
1 AS [C1], 
[GroupBy1].[K1] AS [Name], 
[GroupBy1].[A1] AS [C2]
FROM ( SELECT 
[Extent1].[Name] AS [K1], 
SUM([Extent1].[Value]) AS [A1]
FROM (SELECT 
  [view_answers].[Value] AS [Value], 
  [view_answers].[Comment] AS [Comment], 
  [view_answers].[NewStockist] AS [NewStockist], 
  [view_answers].[NewDistPoint] AS [NewDistPoint], 
  [view_answers].[VoucherUsed] AS [VoucherUsed], 
  [view_answers].[CashTotal] AS [CashTotal], 
  [view_answers].[AnswerType] AS [AnswerType], 
  [view_answers].[StartTime] AS [StartTime], 
  [view_answers].[ActualEndTime] AS [ActualEndTime], 
  [view_answers].[Complete] AS [Complete], 
  [view_answers].[UserID] AS [UserID], 
  [view_answers].[UserName] AS [UserName], 
  [view_answers].[QuestionType] AS [QuestionType], 
  [view_answers].[ProductSKU] AS [ProductSKU], 
  [view_answers].[BrandID] AS [BrandID], 
  [view_answers].[TeamID] AS [TeamID], 
  [view_answers].[Name] AS [Name], 
  [view_answers].[Stage] AS [Stage], 
  [view_answers].[Question] AS [Question]
  FROM [dbo].[view_answers] AS [view_answers]) AS [Extent1]
  WHERE 
     ([Extent1].[UserID] = '16E3692F-806E-40A0-BB99-ABBBCC13060D') 
     AND (N'Distribution Points' = [Extent1].[QuestionType]) 
     AND ([Extent1].[StartTime] >= '11/05/2009 00:00:00') 
     AND ([Extent1].[StartTime] <= '11/08/2009 00:00:00') 
     AND (1 = [Extent1].[Complete]) 
     AND (2 = [Extent1].[BrandID]) 
     AND (N'Distribution Points' = [Extent1].[QuestionType])
  GROUP BY 
     [Extent1].[Name])  
  AS [GroupBy1]

THIS ONE WILL BE MUCH QUICKER

    SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [Name], 
    [GroupBy1].[A1] AS [C2]
    FROM ( SELECT 
    [Extent1].[Name] AS [K1], 
    SUM([Extent1].[Value]) AS [A1]
    FROM (SELECT 
      [view_answers].[Value] AS [Value], 
      [view_answers].[Comment] AS [Comment], 
      [view_answers].[NewStockist] AS [NewStockist], 
      [view_answers].[NewDistPoint] AS [NewDistPoint], 
      [view_answers].[VoucherUsed] AS [VoucherUsed], 
      [view_answers].[CashTotal] AS [CashTotal], 
      [view_answers].[AnswerType] AS [AnswerType], 
      [view_answers].[StartTime] AS [StartTime], 
      [view_answers].[ActualEndTime] AS [ActualEndTime], 
      [view_answers].[Complete] AS [Complete], 
      [view_answers].[UserID] AS [UserID], 
      [view_answers].[UserName] AS [UserName], 
      [view_answers].[QuestionType] AS [QuestionType], 
      [view_answers].[ProductSKU] AS [ProductSKU], 
      [view_answers].[BrandID] AS [BrandID], 
      [view_answers].[TeamID] AS [TeamID], 
      [view_answers].[Name] AS [Name], 
      [view_answers].[Stage] AS [Stage], 
      [view_answers].[Question] AS [Question]
      FROM [dbo].[view_answers] AS [view_answers]) AS [Extent1]
    WHERE 
       ([Extent1].[UserID] = '16E3692F-806E-40A0-BB99-ABBBCC13060D') 
       AND (N'Distribution Points' = [Extent1].[QuestionType]) 
       AND ([Extent1].[StartTime] >= '11/04/2009 00:00:00') 
       AND ([Extent1].[StartTime] <= '11/08/2009 00:00:00') 
       AND (1 = [Extent1].[Complete]) 
       AND (2 = [Extent1].[BrandID]) 
       AND (N'Distribution Points' = [Extent1].[QuestionType])
    GROUP BY 
       [Extent1].[Name])  
    AS [GroupBy1]

If we set the start date to the 5th Nov last year, execution will be slow, the 4th Nov last year, and its fast again. Looking at the data in the database there is nothing unusual around the 5th. Also, it appears that queries where the start date is after the 5th will run slow.

I'm stumped!

(the database is hosted remotely so I don't have direct access to it)

Thanks in advance

UPDATE

Thanks for thr replied guys. Firsltly, I should probably make it clear that my knowledge of sql server is skin deep. I build databases of varying degress of quality and then use something like Linq To SQL or EF to work with them. So I feel a little out of my depth here.

Regarding the joins - the view I'm querying does include data from about 6-7 tables. I shall try and grab some stats next time at work and add some more info here. I dont really know much about execution plans, or is they are something I can see through SQL Server Management Studio

TIA

UPDATE Stats from slow query (3 row(s) affected)

Table 'tblProducts'. Scan count 0, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestionTypes'. Scan count 0, logical reads 1496, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestions'. Scan count 0, logical reads 1496, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblSessions'. Scan count 0, logical reads 28551, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblAnswers'. Scan count 1, logical reads 1976256, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblCalls'. Scan count 1, logical reads 439, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblUsers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Stats from fast query

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblAnswers'. Scan count 1, logical reads 7008, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblProducts'. Scan count 0, logical reads 22, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestions'. Scan count 1, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblQuestionTypes'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblCalls'. Scan count 1, logical reads 439, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblSessions'. Scan count 1, logical reads 47, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'tblUsers'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

UPDATE The query execution plan (a new thing to me) is suggesting that I add a new index to a row in one of the tables - questionid in the answers table. The suggested sql is as follows:

USE [brandfourcoke]
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[tblAnswers] ([QuestionID])
INCLUDE ([CallID],[Value])
GO

Estimated improvements in query cost are 93%. Should I go ahead and do this? The database is in a live environment and uses sql server 2008's change tracking and the sync framework. As such Im always concerned that database changes will impact on the tracked data and break things for my clients. Will adding an index reset change tracking data? Thanks for all your help guys. I feel like a complete beginner here.

+4  A: 

Without knowing anything additional, first guess would be that you are likely either hitting the tipping point of index usage for a particular operation or have outdated/wrong statistical distribution information for given column(s). Could be a few other things as well such as suboptimal indexing (perhaps wrong, perhaps key ordered incorrectly, etc.), given that you are using Sql 2008 are you sure you haven't created a filtered index rather than a traditional full index on the particular column(s), etc. - however, in order to determine that, we'd need to see a lot more information (i.e. schema, indexes, query plans, data distribution, statistics, etc.).

Might help if you could post the query plan used for each of the queries listed above, that would at least help us determine if you are getting drastically different plans.

chadhoc
ok, looking at the execution plan is is telling me that I dont have an index! I guess this is a bad thing. My question is, should I add one. I am using SQL change tracking and am worried that adding an index may reset the change tracking, which would also be a bad thing. Ive also added the IO info in the original postThanks
Sergio
SQL Management studio is suggesting the following:USE [brandfourcoke]GOCREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]ON [dbo].[tblAnswers] ([QuestionID])INCLUDE ([CallID],[Value])GO
Sergio
Well, that's not an easy thing to answer (i.e. should you add the index). Naturally, if you want this particular query to run faster and adding the index does in fact significantly improve the execution, the you should probably add it; however, it's really not a cut and dry answer, so it would be something you should likely try out in your environment to determine if it makes sense or not (the topic of indexing is a very vast, very involved topic). It's also hard to give any guidance on what to index without seeing the views you're talking about and/or existing structures.
chadhoc
Certainly seems like a good place to start however. You could also consider running the query through the database tuning advisor to see what it gives you as well (http://msdn.microsoft.com/en-us/library/ms166575.aspx)
chadhoc
just to update - creating the index, along with another on a separate table improved query time hugely. Query now runs in a couple of seconds! Still not sure on the fundamental reason for the massive swing in performance though. Ah well
Sergio
+4  A: 

Run the slow and fast query with SET sTATISTICS IO ON and see if there is a significant difference in the amount of logical reads/physical reads between the two.

Most likely there is a strong skew in the data. For instance the plan on the fast one does a nested loop driven by a result of 10 rows (resulting in 10 nested lookups), while the slow one suddenly sees 10000 rows from where the previous one has seen 10, resulting in 10000 lookups. Although your query has no joins, the engine may use various access indexes and join the indexes with the cluster index. The actual execution plan will always reveal exactly what's going on.

Remus Rusanu
Seems that's what's going on. I'm not entirely it would suddenly need to pull in so much more data, but it is. :(
Sergio
Yep, 1976256 reads vs. 7008, looks like a table scan on tblAnswers vs. a range scan. Before you add the index, can you post the actual plans, upload the .sqlplan files and post a link to them?
Remus Rusanu
Yep sure - before I stick em on the net - the plans don't include any sensitive info do they? :) not that I don't trust you - its the rest of the net I worry about. I could email them?
Sergio
Is good to be vigilent, and shouldn't trust me neither :). The plans contain details about the name of tables/indexes in the query, which you already posted with the statistics io info. They do not contain any login/user/password credentials info, so I would say is a moderate risk to post them publickly. But always, better safe than sorry. About emailing them, I rather not turn a public forum into private consulting...
Remus Rusanu
A: 

That query is one reason why using generated sql is often a terrible idea. As far as I can tell this is the equivalent query if you had written it yourself:

     SELECT 1 AS [C1],     
   [view_answers].[Name] AS [K1],     
   SUM([view_answers].[Value]) AS [C2]   
  FROM  [view_answers] 
  WHERE ([view_answers].[UserID] = '16E3692F-806E-40A0-BB99-ABBBCC13060D')        
  AND (N'Distribution Points' = [view_answers].[QuestionType])        
  AND ([view_answers].[StartTime] >= '11/04/2009 00:00:00')        
  AND ([view_answers].[StartTime] <= '11/08/2009 00:00:00')        
  AND (1 = [view_answers].[Complete])        
  AND (2 = [view_answers].[BrandID])        
  AND (N'Distribution Points' = [view_answers].[QuestionType])    
  GROUP BY        [view_answers].[Name]

Try this with both dates and see if you get the same results and the same delay when using Nov 5. What fields are indexed?

HLGEM
Whilst your query is much neater, it still runs slow. Check out the updates to the question for some more info
Sergio