views:

2564

answers:

1

I have an ETL process that involves a stored procedure that makes heavy use of SELECT INTO statements (minimally logged and therefore faster as they generate less log traffic). Of the batch of work that takes place in one particular stored the stored procedure several of the most expensive operations are eager spools that appear to just buffer the query results and then copy them into the table just being made.

The MSDN documentation on eager spools is quite sparse - does anyone have a deeper insight into whether these are really necessary (and under what circumstances). I have a few theories that may or may not make sense, but no success in eliminating these from the queries.

The .sqlplan files are quite large (160kb) so I guess it's probably not reasonable to post them directly to a forum.

So, here are some theories that may be amenable to specific answers:

o The query uses some UDF's for data transformation, such as parsing formatted dates. Does this data transformation necessitate the use of eager spools to allocate sensible types (e.g. varchar lengths) to the table before it constructs it?

o As an extension of the question above, does anyone have a deeper view of what does or does not drive this operation in a query?

Nigel

+5  A: 

My understanding of spooling is that it's a bit of a red herring on your execution plan. Yes, it accounts for a lot of your query cost, but it's actually an optimization that SQL Server undertakes automatically so that it can avoid costly rescanning. If you were to avoid spooling, the cost of the execution tree it sits on will go up and almost certainly the cost of the whole query would increase. I don't have any particular insight into what in particular might cause the database's query optimizer to parse the execution that way, especially without seeing the SQL code, but you're probably better off trusting its behavior.
However, that doesn't mean your execution plan can't be optimized, depending on exactly what you're up to and how volatile your source data is. When you're doing a SELECT INTO, you'll often see spooling items on your execution plan, and it can be related to read isolation. If it's appropriate for your particular situation, you might try just lowering the transaction isolation level to something less costly, and/or using the NOLOCK hint. I've found in complicated performance-critical queries that NOLOCK, if safe and appropriate for your data, can vastly increase the speed of query execution even when there doesn't seem to be any reason it should.
In this situation, if you try READ UNCOMMITTED or the NOLOCK hint, you may be able to eliminate some of the Spools.
(Obviously you don't want to do this if it's likely to land you in an inconsistent state, but everyone's data isolation requirements are different)
The TOP operator and the OR operator can occasionally cause spooling, but I doubt you're doing any of those in an ETL process...
You're right in saying that your UDFs could also be the culprit. If you're only using each UDF once, it would be an interesting experiment to try putting them inline to see if you get a large performance benefit. (And if you can't figure out a way to write them inline with the query, that's probably why they might be causing spooling)
One last thing I would look at is that, if you're doing any joins that can be re-ordered, try using a hint to force the JOIN order to happen in what you know to be the most selective order. That's a bit of a reach but it doesn't hurt to try it if you're already stuck optimizing.

Grank
Read isolation may well be applicable as the process queries from a staging area copied from the source. Additionally, even if this does not fix my particular issue it adds a bit of insight as this is not mentioned in any of the MSDN literature that I could find concerning eager spool operations.
ConcernedOfTunbridgeWells
I'm glad it was some help. We might be able to help you further if you posted the SQL code in question (genericized if necessary of course)
Grank