views:

93

answers:

2

(At first glance this may look like a duplicate of http://stackoverflow.com/questions/421275 or http://stackoverflow.com/questions/414336, but my actual question is a bit different)

Alright, this one's had me stumped for a few hours. My example here is ridiculously abstracted, so I doubt it will be possible to recreate locally, but it provides context for my question (Also, I'm running SQL Server 2005).

I have a stored procedure with basically two steps, constructing a temp table, populating it with very few rows, and then querying a very large table joining against that temp table. It has multiple parameters, but the most relevant is a datetime "@MinDate." Essentially:

create table #smallTable (ID int)

insert into #smallTable
select (a very small number of rows from some other table)

select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @MinDate

If I just execute this as a normal query, by declaring @MinDate as a local variable and running that, it produces an optimal execution plan that executes very quickly (first joins on #smallTable and then only considers a very small subset of rows from aGiantTable while doing other operations). It seems to realize that #smallTable is tiny, so it would be efficient to start with it. This is good.

However, if I make that a stored procedure with @MinDate as a parameter, it produces a completely inefficient execution plan. (I am recompiling it each time, so it's not a bad cached plan...at least, I sure hope it's not)

But here's where it gets weird. If I change the proc to the following:

declare @LocalMinDate datetime
set @LocalMinDate = @MinDate --where @MinDate is still a parameter

create table #smallTable (ID int)

insert into #smallTable
select (a very small number of rows from some other table)

select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @LocalMinDate

Then it gives me the efficient plan!


So my theory is this: when executing as a plain query (not as a stored procedure), it waits to construct the execution plan for the expensive query until the last minute, so the query optimizer knows that #smallTable is small and uses that information to give the efficient plan.

But when executing as a stored procedure, it creates the entire execution plan at once, thus it can't use this bit of information to optimize the plan.

But why does using the locally declared variables change this? Why does that delay the creation of the execution plan? Is that actually what's happening? If so, is there a way to force delayed compilation (if that indeed is what's going on here) even when not using local variables in this way?

More generally, does anyone have sources on when the execution plan is created for each step of a stored procedure? Googling hasn't provided any helpful information, but I don't think I'm looking for the right thing. Or is my theory just completely unfounded?

Edit: Since posting, I've learned of parameter sniffing, and I assume this is what's causing the execution plan to compile prematurely (unless stored procedures indeed compile all at once), so my question remains -- can you force the delay? Or disable the sniffing entirely?

The question is academic, since I can force a more efficient plan by replacing the select * from aGiantTable with

select * from (select * from aGiantTable where ID in (select ID from #smallTable)) as aGiantTable

Or just sucking it up and masking the parameters, but still, this inconsistency has me pretty curious.


tl;dnr

This is an egregiously long question, so in brief:

Is the full execution plan created when the stored procedure is first called, or as it executes? That is, if a stored procedure consists of multiple steps, is the execution plan for each step created when the procedure is first called, or is it only created after past steps have finished executing (again, the first time it's called)?

+1  A: 

Some additional articles for you to look at:

http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx http://sqlblog.com/blogs/ben_nevarez/archive/2009/08/27/the-query-optimizer-and-parameter-sniffing.aspx

Note that you could also use the "recompile" query option to work-around "parameter sniffing"

etliens
I've read the sources you linked. I don't think you understand about the recompilation -- that's what's *causing* the problem in the first place. Like I said, I already I am recompiling the query with each execution. If I cached a valid plan it would provide a workaround, but this doesn't solve the underlying problem.
Ian Henry
Are you using the same parameter values every time for your testing?Btw: sprocs are initially compiled on first input and optimizes for the values that were passed into any input parameters.
etliens
+1  A: 

This is parameter sniffing and if you don't have SQL Server 2008 and OPTIMIZE FOR UNKNOWN, then masking parameters with local variables (as you have found) is your best bet.

Cade Roux
Well, at least there's a workaround in 2008. I wound up using an optimize hint with a value I found to produce the optimal plan, which gave me pretty much the same effect as that.
Ian Henry