ansaurus

Question

Why is inserting into and joining #temp tables faster??

Answer 1

+8 A:

Your procedures are being reevaluated for each row in P.

What you do with the temp tables is in fact caching the resultset generated by the stored procedures, thus removing the need to reevaluate.

Inserting into a temp table is fast because it does not generate redo / rollback.

Joins are also fast, since having a stable resultset allows possibility to create a temporary index with an Eager Spool or a Worktable

You can reuse the procedures without temp tables, using CTE's, but for this to be efficient, SQL Server needs to materialize the results of CTE.

You may try to force it do this with using an ORDER BY inside a subquery:

WITH    f1 AS
        (
        SELECT  TOP 1000000000
                A.ColumnX,
                A.ColumnY
        FROM    dbo.TableReturningFunc1(@StaticParam1, @StaticParam2) AS A
        ORDER BY
                A.key
        ),
        f2 AS
        (
        SELECT  TOP 1000000000
                B.ColumnX,
                B.ColumnY,
        FROM    dbo.TableReturningFunc2(@StaticParam1, @StaticParam2) AS B  
        ORDER BY
                B.Key
        )
SELECT  …

, which may result in Eager Spool generated by the optimizer.

However, this is far from being guaranteed.

The guaranteed way is to add an OPTION (USE PLAN) to your query and wrap the correspondind CTE into the Spool clause.

See this entry in my blog on how to do that:

Generating XML in subqueries

This is hard to maintain, since you will need to rewrite your plan each time you rewrite the query, but this works well and is quite efficient.

Using the temp tables will be much easier, though.

Quassnoi 2009-05-28 16:45:44

How could I achieve this without doing the temp table work explicitly, if possible?

Joseph Kingry 2009-05-28 16:58:42

Sure it's possible, see the post update

Quassnoi 2009-05-28 17:04:28

Performance with CTE's 1:28, Temp table method 0:04. Much better than 10+ minutes.. but still orders of magnitude in difference.

Joseph Kingry 2009-05-28 17:21:17

Could you please post your stored procedures so that I can check performance?

Quassnoi 2009-05-28 18:31:47

Answer 2

+2 A:

It is a problem with your sub-query referencing your outer query, meaning the sub query has to be compiled and executed for each row in the outer query. Rather than using explicit temp tables, you can use a derived table. To simplify your example:

SELECT P.Column1,
       (SELECT [your XML transformation etc] FROM A where A.ID = P.ID) AS A

If P contains 10,000 records then SELECT A.ColumnX FROM A where A.ID = P.ID will be executed 10,000 times.
You can instead use a derived table as thus:

SELECT P.Column1, A2.Column FROM  
P LEFT JOIN 
 (SELECT A.ID, [your XML transformation etc] FROM A) AS A2 
 ON P.ID = A2.ID

Okay, not that illustrative pseudo-code, but the basic idea is the same as the temp table, except that SQL Server does the whole thing in memory: It first selects all the data in "A2" and constructs a temp table in memory, then joins on it. This saves you having to select it to TEMP yourself.

Just to give you an example of the principle in another context where it may make more immediate sense. Consider employee and absence information where you want to show the number of days absence recorded for each employee.

Bad: (runs as many queryes as there are employees in the DB)

SELECT EmpName, 
 (SELECT SUM(absdays) FROM Absence where Absence.PerID = Employee.PerID) AS Abstotal        
FROM Employee

Good: (Runs only two queries)

SELECT EmpName, AbsSummary.Abstotal
FROM Employee LEFT JOIN
      (SELECT PerID, SUM(absdays) As Abstotal 
       FROM Absence GROUP BY PerID) AS AbsSummary
ON AbsSummary.PerID = Employee.PerID

Frans 2009-05-28 17:02:35

This doesn't completely fix the issue I'm having. Using LEFT OUTER JOINS and converting to using FOR XML PATH (to get proper nesting) I get a runtime of 1:26. Compared to 0:04 for the temp table method.

Joseph Kingry 2009-05-28 17:15:32

Ah, fair enough :)

Frans 2009-05-28 17:33:23

Right, here is what I would do at this point; Break your query down into it's components and load each part up in Query Analyser and select "show estimated execution plan". Then do the same thing for the whole query. It should give you a good idea about which step is slow and how the optimiser is interpreting your request. See if it is one of the subcomponents or only when it is all put together. Look for loops and scans. Most importantly, you will probably find one step that is responsible for almost all the execution time - see if you can optimise that.

Frans 2009-05-28 18:31:36

Answer 3

+1 A:

Consider using the WITH common_table_expression construct for what you now have as sub-selects or temporary tables, see http://msdn.microsoft.com/en-us/library/ms175972(SQL.90).aspx .

Alex Martelli 2009-05-28 17:12:27

Answer 4

A:

This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by de> This makes not a lot of sense, as it would seem the cost to insert into a temp table and then do the join should be higher by default.fault.

With temporary tables, you explitly instruct Sql Server which intermediate storage to use. But if you stash everything in a big query, Sql Server will decide for itself. The difference is not really that big; at the end of the day, temporary storage is used, whether you specify it as a temp table or not.

In your case, temporary tables work faster, so why not stick to them?

Andomar 2009-05-28 20:00:20

Answer 5

A:

There are several possible reasons why using intermediate Temp tables might speed up a query, but the most likely in your case is that the functions which are being called (but are not listed), are probably Multi-statement TVF's and not in-line TVF's. Multi-statement TVF's are opaque to the optimization of their calling queries and thus the optimizer cannot tell if there are any oppurtunities for re-use of data, or other logical/physical operator re-ordering optimizations. Thus, all it can do is to re-execute the TVFs every time that the containing query is supposed to produce another row with the XML columns.

In short, multi-statement TVF's frustrate the optimizer.

The usual solutions, in order of (typical) preference are:

Re-write the offending multi-statement TVF to be an in-line TVF
In-line the function code into the calling query, or
Dump the offending TVF's data into a temp table. which is what you've done...

RBarryYoung 2009-05-28 20:46:05

I would agree, except that the TVF's are inline, basically paramaterized views. Will add that clarification to the question. Good suggestion in general though.

Joseph Kingry 2009-05-29 21:06:13

Answer 6

A:

If creating a temp table with intermediate results is faster, why doesn't the optimizer just do it on its own? Isn't that the job of the optimizer?

All of the other answers so far talk about how to just get to same point you're already at.

2009-05-28 20:58:29

The query optimizer doesn't cross procedural boundaries, therefore, it can't know that that would be better in this case (it isn't always).

RBarryYoung 2009-05-29 12:59:23

I have no idea what you're talking about. What procedural boundary? Writing intermediate results out to disk is merely a query engine operation.

2009-06-25 18:25:57

Answer 7

A:

If temp tables turn out to be faster in your particular instance, you should instead use a table variable.

There is a good article here on the differences and performance implications:

http://www.codeproject.com/KB/database/SQP_performance.aspx

ScottE 2009-05-28 21:03:42

In SQL 2005 and above temp tables are as fast or faster that table variables the vast majority of the time. This is because table variables can not have statistics on them so to the query optimizer they appear to have only 1 row (look at the query plan). As such they are optimized incorrectly and tend to perform poorly anytime they have significantly more than say 10 rows.This article use similar tests to the article that you reference but puts shows what happens on 2005 when you put more rows in.

RBarryYoung 2009-05-29 13:18:51

Oops, here's the article: http://www.sql-server-performance.com/articles/per/temp_tables_vs_variables_p1.aspx

RBarryYoung 2009-05-29 13:19:25

Heh. Actually, they are both the same article!

RBarryYoung 2009-05-29 13:25:31

I found this as well. Using table variables was not as performant when compared to #temp tables.

Joseph Kingry 2009-05-29 21:14:54

ansaurus

tags:

views:

answers:

Why is inserting into and joining #temp tables faster??

related questions