views:

1501

answers:

6

We have a query that is taking around 5 sec on our production system, but on our mirror system (as identical as possible to production) and dev systems it takes under 1 second.

We have checked out the query plans and we can see that they differ. Also from these plans we can see why one is taking longer than the other. The data, schame and servers are similar and the stored procedures identical.

We know how to fix it by re-arranging the joins and adding hints, However at the moment it would be easier if we didn't have to make any changes to the SProc (Paperwork). We have also tried a sp_recompile.

What could cause the difference between the two query plans?

System: SQL 2005 SP2 Enterprise on Win2k3 Enterprise

Update: Thanks for your responses, it turns out that it was statistics. See summary below.

+9  A: 

Your statistics are most likely out of date. If your data is the same, recompute the statistics on both servers and recompile. You should then see identical query plans.

Also, double-check that your indexes are identical.

Dave Markle
+3  A: 

is the data & data size between your mirror and production as close to the same as possible? If you know why one query taking longer then the other? can you post some more details?

Execution plans can be different in such cases because of the data in the tables and/or the statistics. Even in cases where auto update statistics is turned on, the statistics can get out of date (especially in very large tables) You may find that the optimizer has estimated a table is not that large and opted for a table scan or something like that.

Nick Kavadias
There is not much difference in table size as it is a recent copy. the tables themselves are around 2000 rows.
Robert Wagner
+3  A: 

Most likely statistics.

Some thoughts: Do you do maintenance on your non-prod systems? (eg rebuidl indexes, which will rebuild statistics)

If so, do you use the same fillfactor and statistics sample ratio?

Do you restore the database regularly onto test so it's 100% like production?

gbn
Turns out that it was prod that didn't have maintenance done on it.
Robert Wagner
+2  A: 

Provided there is no WITH RECOMPILE option on your proc, the execution plan will get cached after the first execution.

Here is a trivial example on how you can get the wrong query plan cached:

create proc spTest
    @id int 
as 
select * from sysobjects where @id is null or id = id 

go 

exec spTest null
-- As expected its a clustered index scan

go

exec spTest 1
-- OH no its a clustered index scan

Try running your Sql in QA on the production server outside of the stored proc to determine if you have an issue with your statistics being out of date or mysterious indexes missing from production.

Sam Saffron
As mentioned we have tried to refresh the query plan. We did try running the query outside the Sproc with the same results (pointing away from bad cached query plans). At least something is eliminated.
Robert Wagner
Yerp, you have an issue with your stats, data or indexes, if I were you I would grab a backup of your production server and run diagnostics locally ...
Sam Saffron
+2  A: 

Tying in to the first answer, the problem may lie with SQL Server's Parameter Sniffing feature. It uses the first value that caused compilation to help create the execution plan. Usually this is good but if the value is not normal (or somehow strange), it can contribute to a bad plan. This would also explain the difference between production and testing.

Turning off parameter sniffing would require modifying the SProc which I understand is undesirable. However, after using sp_recompile, pass in parameters that you'd consider "normal" and it should recompile based off of these new parameters.

I think the parameter sniffing behavior is different between 2005 and 2008 so this may not work.

colithium
As mentioned, we did did do this with no success. However good detail on how to test it.
Robert Wagner
I just pictured running sp_recompile and then using the same weird values which would make the same bad plan. It was a long-shot anyway.
colithium
A: 

The solution was to recalculate the statistics. I overlooked that as usually we have scheduled tasks to do all of that, but for some reason the admins didn't put one one this server, Doh.

To summarize all the posts:

  • Check the setup is the same
    • Indexes
    • Table sizes
    • Restore Database
  • Execution Plan Caching
    • If the query runs the same outside the SProc, it's not the Execution Plan
    • sp_recompile if it is different
    • Parameter sniffing
  • Recompute Statistics
Robert Wagner