views:

556

answers:

5

I have a complex query that I need to use in a subsequent query (actually update statement). I have tried both using a CTE and a temp table. The performance using the CTE is horrible vs the temp table approach. Its something like 15 seconds vs milliseconds. To simplify the test instead of joining the CTE/Temp table in the subsequent query I simply selected * from it. In that case they perform the same.

I Have Looked at The Execution Plan for both approaches both with the joins in the subsequent query and then simply select *. With the simple select the query plans are about the same, but with the joins in the subsequent select the query plans are not. Specifically the portion of the query plan for creating and populating the temp table stays the same, while the query plan portion for creating and populating the CTE changes dramatically when it is subsequently used in a query with a join.

My question is why does the query plan for the creation and population of the CTE change by how it is subsequently used while the temp table is not. Also in what scenarios then would a CTE yield better performance than a temp table?

*Note I have used a table variable as well and it is comparable to the temp table approach.

Thanks

+1  A: 

CTE is just an alias for the query.

It may (or may not) be rerun each time it's used.

There is no clean way to force CTE materialization in SQL Server (like Oracle's /*+ MATERIALIZE */), and you have to do dirty tricks like this:

CTE may improve performance if used in plans requiring only one evaluation (like HASH JOIN, MERGE JOIN etc.).

In these scenarios, the hash table will be built right from the CTE, while using the temp table will require evaluating the CTE, pulling the results into the temp table and reading the temp table once again.

Quassnoi
+1  A: 

You're asking a complicated question, so you're getting a complicated answer: it depends. (I hate that response).

Seriously, however, it has to do with how the optimizer chooses a data plan (which you knew already); a temp table or variable is like a permanent structure in that an execution plan will perform the operation associated with filling that structure first, and then use that structure in subsequent operations. A CTE is NOT a temp table; use of the CTE is not calculated until it is being used by subsequent operations, and so that usage impacts how the plan is optimized.

CTE's were implemented for reusability and maintenance issues, not necessarily performance; however, in many cases (like recursion), they will perform better than traditional coding methods.

Stuart Ainsworth
A: 

Further to the comments provided by Quassnoi,

If I recall correctly from a Performance Tuning presentation given be Kevin Kline, a CTE actually creates temporary table data structures in the background however, the implementation used is naturally generalised and therefore may not be as optimised as an explicit implementation of your own design.

For this reason using an explicit temporary table of your own design allows you greater control, i.e. you can add indexes to the temporary table that directly address the needs of your query for example.

John Sansom
A `CTE` *may* create a temporary table (which you will then see as a `Worktable` in the plan), but it is not guaranteed to do that. If you run `WITH q AS (SELECT NEWID() AS n) SELECT * FROM q UNION ALL SELECT * FROM q`, this will give you two different newid's
Quassnoi
A: 

I find that typically a repeated CTE gets no performance improvements.

So for instance, if you use a CTE to populate a table and then the same CTE to join to in a later query, no benefit. Unfortunately, CTEs are not snapshots and they literally have to be repeated to be used in two separate statements, so they tend to be evaluated twice.

Instead of CTEs, I often use inline TVFs (which may contain CTEs), which allows proper re-use, and are not any better or worse than CTEs in my SPs.

In addition, I also find that the execution plan can be bad if the first step alters the statistics such that the execution plan for the second step is always inaccurate because it is evaluated before any steps are run.

In this case, I look at manually storing intermediate results, ensuring that they are indexed properly and splitting up the process into multiple SPs and adding WITH RECOMPILE to ensure that later SPs have plans that are good for the data which they are actually going to operate on.

Cade Roux
A: 

I tried creating CTE with simple selected with filter from big table Then 3 times subquired it. After that do the same with temporaly tables. The result was 70% time consuming for CTE -30% time consuming for temp table. So temp table is better for that solutions. I don't think CTE makes a temp table only with selected query ,but 3 times make select to a big table.

Anton