views:

56

answers:

2

I am developing a system which periodically (4-5 times daily) runs a select statement, that normally takes less than 10 seconds but periodically has taken up to 40 minutes.

The database is on Windows Server 2008 + SQL Server 2008 R2; both 64bit.

There is a service on the machine running the database which polls the database and generates values for records which require it. These records are then periodically queried using a multi table join select from a service on a second machine written in C++ (VS 2010) using the MFC CRecordset class to extract the data. An example of the the query causing the problem is shown below.

SELECT DISTINCT "JobKeysFrom"."Key" AS "KeyFrom","KeysFrom"."ID" AS "IDFrom",
"KeysFrom"."X" AS "XFrom","KeysFrom"."Y" AS "YFrom","JobKeysTo"."Key" AS "KeyTo",
"KeysTo"."ID" AS "IDTo","KeysTo"."X" AS "XTo","KeysTo"."Y" AS "YTo",
"Matrix"."TimeInSeconds","Matrix"."DistanceInMetres","Matrix"."Calculated"
FROM "JobKeys" AS "JobKeysFrom"
INNER JOIN "JobKeys" AS "JobKeysTo" ON 
("JobKeysFrom"."Key"<>"JobKeysTo"."Key") AND 
("JobKeysFrom"."JobID"=531) AND 
("JobKeysTo"."JobID"=531)
INNER JOIN "Keys" AS "KeysFrom" ON 
("JobKeysFrom"."Key"="KeysFrom"."Key") AND ("JobKeysFrom"."Status"=4)
INNER JOIN "Keys" AS "KeysTo" ON 
("JobKeysTo"."Key"="KeysTo"."Key") AND ("JobKeysTo"."Status"=4)
INNER JOIN "Matrix" AS "Matrix" ON 
("Matrix"."IDFrom"="KeysFrom"."ID") AND ("Matrix"."IDTo"="KeysTo"."ID")
ORDER BY "JobKeysFrom"."Key","JobKeysTo"."Key"

I have tried the following

  1. checked the indexes and all seem correct and they are active and are being used according to the query
  2. the design advisor comes back with no suggestions
  3. I have tried defragging the indexes and data
  4. rebuilt the database from scratch by exporting the data and reimporting it in a new database.
  5. ran the profiler on it and found that when it goes wrong it seems to do many millions (up to 100 million) of reads rather than a few hundred thousand.
  6. ran the database on a different server

During the time it is running the query, I can run exactly the same query in the management studio window and it will be back to running in 10 seconds. The problem does not seem to be lock, deadlock, CPU, disk or memory related as it has done it when the machine running the database was only running this one query. The server has 4 processors and 16 gb of memory to run it in. I have also tried upgrading the disks to much faster ones and this had no effect.

It seems to me that it is almost as though the database receives the query, starts to process it and then goes to sleep for 40 minutes or runs the query without using the indexes.

When it takes a long time it will eventually finish and send the query results (normally about 70-100000 records) back to the calling application.

Any help or suggestions would be gratefully received, many thanks

+1  A: 

This sounds very much like parameter sniffing.

When a stored procedure is invoked and there is no existing execution plan in the cache matching the set options for the connection a new execution plan will be compiled using the parameter values passed in on that invocation.

Sometimes this will happen when the parameters passed are atypical (e.g. have unusually high selectivity) so the generated plan will not be suitable for most other invocations with different parameters. For example it may choose a plan with index seeks and bookmark lookups which is fine for a highly selective case but poor if it needs to be done hundreds of thousands of times.

This would explain why the number of reads goes through the roof.

Your SSMS connection will likely have different SET ... options so will not get handed the same problematic plan from the cache when you execute the stored procedure inside SSMS

You can use the following to get the plan for the slow session

select p.query_plan, *
from sys.dm_exec_requests r
cross apply sys.dm_exec_query_plan(r.plan_handle) p
where r.session_id = <session_id>

Then compare with the plan for the good session.

If you do determine that parameter sniffing is at fault you can use OPTIMIZE FOR hints to avoid it choosing the bad plan.

Martin Smith
I think we build our discourses on the invalid initial conditions and statements :)
igor
A: 

Check that you don't have a maintenance task running that is rebuilding indexes, or that your database statistics are somehow invalid when the query is executed.

This is exactly the sort of thing one would expect to see if the query is not using your indexes, which is usually because either the indexes are not accessible to the query at the point it runs or because the statistics are invalid and make the optimiser believe that your large table(s) only have a few rows in them and the query would run faster with a full table scan than using indexed access.

Cruachan
You missed the point that a concurrent execution of the same query returns quickly?
Martin Smith