views:

193

answers:

4

I'm maintaining stored procedures for SQL Server 2005 and I wish I could use a new feature in 2008 that allows the query hint: "OPTIMIZE FOR UNKNOWN"

It seems as though the following query (written for SQL Server 2005) estimates the same number of rows (i.e. selectivity) as if OPTION (OPTIMIZE FOR UNKNOWN) were specified:

CREATE PROCEDURE SwartTest(@productid INT)
AS  
DECLARE @newproductid INT
SET @newproductid = @productid

SELECT ProductID 
FROM Sales.SalesOrderDetail 
WHERE ProductID = @newproductid

This query avoids parameter sniffing by declaring and setting a new variable. Is this really a SQL Server 2005 work-around for the OPTIMIZE-FOR-UNKNOWN feature? Or am I missing something? (Authoritative links, answers or test results are appreciated).

More Info: A quick test on SQL Server 2008 tells me that the number of estimated rows in this query is in fact the same as if OPTIMIZE FOR UNKNOWN was specified. Is this the same behavior on SQL Server 2005? I think I remember hearing once that without more info, the SQL Server Optimizing Engine has to guess at the selectivity of the parameter (usually at 10% for inequality predicates). I'm still looking for definitive info on SQL 2005 behavior though. I'm not quite sure that info exists though...

More Info 2: To be clear, this question is asking for a comparison of the UNKNOWN query hint and the parameter-masking technique I describe.

It's a technical question, not a problem solving question. I considered a lot of other options and settled on this. So the only goal of this question was to help me gain some confidence that the two methods are equivalent.

+1  A: 

Interesting question.

There's a good article on the SQL Programming and API Development Team blog here which lists the workaround solutions, pre-SQL 2008 as:

  1. use RECOMPILE hint so the query is recompiled every time
  2. unparameterise the query
  3. give specific values in OPTIMIZE FOR hint
  4. force use of a specific index
  5. use a plan guide

Which leads me on to this article, which mentions your workaround of using local parameters and how it generates an execution plan based on statistics. How similar this process is to the new OPTIMIZER FOR UNKNOWN hint, I don't know. My hunch is it is a reasonable workaround.

AdaTheDev
I have the same hunch. And your "I don't know" is the question I'm asking... But thanks for taking the time to reply.
Michael J Swart
+2  A: 

I've used that solution several times recently to avoid parameter sniffing on SQL 2005 and it seems to me to do the same thing as OPTIMIZE FOR UNKNOWN on SQL 2008. Its fixed a lot of problems we had with some of our bigger stored procedures sometimes just hanging when passed certain parameters.

Gavin Draper
That's encouraging. It seems to be the same thing to me too. I wonder if there's a way to find out for sure.
Michael J Swart
+2  A: 

Okay, so I've done some experimenting. I'll write up the results here, but first I want to say that based on what I've seeen and know, I'm confident that using temporary parameters in 2005 and 2008 is exactly equivalent to using 2008's OPTIMIZE FOR UNKNOWN. At least in the context of stored procedures.

So this is what I've found. In the procedure above, I'm using the AdventureWorks database. (But I use similar methods and get similar results for any other database) I ran:

dbcc show_statistics ('Sales.SalesOrderDetail', IX_SalesOrderDetail_ProductID) 

And I see statistics with 200 steps in its histogram. Looking at its histogram I see that there are 66 distinct range rows (i.e. 66 distinct values that weren't included in stats as equality values). Add the 200 equality rows (from each step), and I get an estimate of 266 distinct values for ProductId in Sales.SalesOrderDetail.

With 121317 rows in the table, I can estimate that each ProductId has 456 rows on average. And when I look at the query plan for my test procedure (in xml format) I see something like:

...
<QueryPlan DegreeOfParallelism="1"  >
  <RelOp NodeId="0" 
         PhysicalOp="Index Seek" 
         LogicalOp="Index Seek" 
         EstimateRows="456.079" 
         TableCardinality="121317"  />
    ...
  <ParameterList>
    <ColumnReference 
      Column="@newproductid" 
      ParameterRuntimeValue="(999)" />
  </ParameterList>
</QueryPlan>
...       

So I know where the EstimateRows value is coming from (accurate to three decimals) and Notice that the ParameterCompiledValue attribute is missing from query plan. This is exactly what a plan looks like when using 2008's OPTIMIZE FOR UNKNOWN

Michael J Swart
A: 

I've been using this parameter masking technique for at least the past year because off odd performance problems, and it has worked well, but is VERY annoying to have to do all the time.

I have ALSO been using WITH RECOMPILE.

I do not have controlled tests because I can't selectively turn the usage of each on and off automatically in the system, but I suspect that the parameter masking will only help IF the parameter is used. I have some complex SPs where the parameter is not used in every statement, and I expect that WITH RECOMPILE was still necessary because some of the "temporary" work tables are not populated (or even indexed identically, if I'm trying to tune) the same way on every run, and some later statements don't rely on the parameter once the work tables are already appropriately populated. I have broken some processes up into multiple SPs precisely so that work done to populate a work table in one SP can be properly analyzed and executed against WITH RECOMPILE in the next SP.

Cade Roux
AdaTheDev suggested using the RECOMPILE hint. I decided against it because of my particular situation (high volume, large sproc etc...). But thanks for the technique, it's good to know. I'll put it into my mental tool box.
Michael J Swart