views:

73

answers:

2

Hi

We have a site in development that when we deployed it to the client's production server, we started getting query timeouts after a couple of hours.

This was with a single user testing it and on our server (which is identical in terms of Sql Server version number - 2005 SP3) we have never had the same problem.

One of our senior developers had come across similar behaviour in a previous job and he ran a query to manually update the statistics and the problem magically went away - the query returned in a few miliseconds.

A couple of hours later, the same problem occurred.So we again manually updated the statistics and again, the problem went away. We've checked the database properties and sure enough, auto update statistics isTRUE.

As a temporary measure, we've set a task to update stats periodically, but clearly, this isn't a good solution.

The developer who experienced this problem before is certain it's an environment problem - when it occurred for him previously, it went away of its own accord after a few days.

We have examined the SQL server installation on their db server and it's not what I would regard as normal. Although they have SQL 2005 installed (and not 2008) there's an empty "100" folder in installation directory. There is also MSQL.1, MSQL.2, MSQL.3 and MSQL.4 (which is where the executables and data are actually stored).

If anybody has any ideas we'd be very grateful - I'm of the opinion that rather than the statistics failing to update, they are somehow becoming corrupt.

Many thanks

Tony

+2  A: 

Is not that the statistics are outdated. What happens when you update statistics all plans get invalidated and some bad cached plan gets evicted. Things run smooth until a bad plan gets again cached and causes slow execution.

The real question is why do you get bad plans to start with? We can get into lengthy technical and philosophical arguments whether a query processor shoudl create a bad plan to start with, but the thing is that, when applications are written in a certain way, bad plans can happen. The typical example is having a where clause like (@somevaribale is null) or (somefield= @somevariable). Ultimately 99% of the bad plans can be traced to developers writing queries that have C style procedural expectation instead of sound, set based, relational processing.

What you need to do now is to identify the bad queries. Is really easy, just check sys.dm_exec_query_stats, the bad queries will stand out in terms of total_elapsed_time and total_logical_reads. Once you identified the bad plan, you can take corrective measures which depend from query to query.

Remus Rusanu
@Remus Rsanu: Why this "(@somevaribale is null) or (somefield= @somevariable). " cause problems ?
Vash
@Vash, that can't use an index, so it table scans. combine that with user transactions on that table any you have blocking and waiting. read [Dynamic Search Conditions in T-SQL by Erland Sommarskog](http://www.sommarskog.se/dyn-search.html), here is the part on ["(@somevaribale is null) or (somefield= @somevariable). "](http://www.sommarskog.se/dyn-search-2005.html#OR_ISNULL)
KM
-1 Parameter sniffing can be caused by a poor choice of default or an atypical value. Not all developers are muppets at SQL. I used to be able to demo parameter sniffing on demand. Now, if course you're on the SQL Server development team and arguably you could be defending your product by blaming developers. Bad plans caused by parameters sniffing *happen* in real life
gbn
The queries are not bad queries - the where clauses can use indexes. And an obvious point to note is that this doesn't occur in the staging environment.
eggheaddesign
What is important is that the problem queries can be identified and once identified they can be fixed.
Remus Rusanu
@Remus Rusanu: yes, via masking/UNKNOWN etc to fix a known issue. Some bad plans will be poor SQL skills but not "99%" as you claim
gbn
+3  A: 

Disagreeing with Remus...

Parameter sniffing allows SQL Server to guess the optimal plan for a wide range of input values. Some times, it's wrong and the plan is bad because of an atypical value or a poorly chosen default.

I used to be able to demonstrate this on demand by changing a default between 0 and NULL: plan and performance changed dramatically.

A statistics update will invalidate the plan. The query will thus be compiled and cached when next used

The workarounds are one of these follows:

  • parameter masking
  • use OPTIMISE FOR UNKNOWN hint
  • duplicate "default"

See these SO questions

Now, Remus works for the SQL Server development team. However, this phenomenon is well documented by Microsoft on their own website so blaming developers is unfair

gbn
The queries are not bad queries - the where clauses can use indexes.And an obvious point to note is that this doesn't occur in the staging environment.
eggheaddesign
@eggheaddesign: I agree. I offer 3 permanent fixes...
gbn
@gbn - thanks. It's way past our bedtime here in the UK (ok our home time) so we'll be looking at your fixes in the morning now
eggheaddesign