views:

316

answers:

1

Hello,

I have written a table-valued UDF that starts by a CTE to return a subset of the rows from a large table. There are several joins in the CTE. A couple of inner and one left join to other tables, which don't contain a lot of rows. The CTE has a where clause that returns the rows within a date range, in order to return only the rows needed.

I'm then referencing this CTE in 4 self left joins, in order to build subtotals using different criterias.

The query is quite complex but here is a simplified pseudo-version of it

WITH DataCTE as
(
     SELECT [columns] FROM table
                      INNER JOIN table2
                      ON [...]

                      INNER JOIN table3
                      ON [...]

                      LEFT JOIN table3
                      ON [...]
)
SELECT [aggregates_columns of each subset] FROM DataCTE Main
LEFT JOIN DataCTE BananasSubset
               ON [...] 
             AND Product = 'Bananas'
             AND Quality = 100
LEFT JOIN DataCTE DamagedBananasSubset
               ON [...]
             AND Product = 'Bananas'
             AND Quality < 20
LEFT JOIN DataCTE MangosSubset
               ON [...]
GROUP BY [

I have the feeling that SQL Server gets confused and calls the CTE for each self join, which seems confirmed by looking at the execution plan, although I confess not being an expert at reading those.

I would have assumed SQL Server to be smart enough to only perform the data retrieval from the CTE only once, rather than do it several times.

I have tried the same approach but rather than using a CTE to get the subset of the data, I used the same select query as in the CTE, but made it output to a temp table instead.

The version referring the CTE version takes 40 seconds. The version referring the temp table takes between 1 and 2 seconds.

Why isn't SQL Server smart enough to keep the CTE results in memory?

I like CTEs, especially in this case as my UDF is a table-valued one, so it allowed me to keep everything in a single statement.

To use a temp table, I would need to write a multi-statement table valued UDF, which I find a slightly less elegant solution.

Did some of you had this kind of performance issues with CTE, and if so, how did you get them sorted?

Thanks,

Kharlos

A: 

I believe that CTE results are retrieved every time. With a temp table the results are stored until it is dropped. This would seem to explain the performance gains you saw when you switched to a temp table.

Another benefit is that you can create indexes on a temporary table which you can't do to a cte. Not sure if there would be a benefit in your situation but it's good to know.

Related reading:

Quote from the last link:

The CTE's underlying query will be called each time it is referenced in the immediately following query.

I'd say go with the temp table. Unfortunately elegant isn't always the best solution.

UPDATE:

Hmmm that makes things more difficult. It's hard for me to say with out looking at your whole environment.

Some thoughts:

  • can you use a stored procedure instead of a UDF (instead, not from within)?
  • This may not be possible but if you can remove the left join from you CTE you could move that into an indexed view. If you are able to do this you may see performance gains over even the temp table.
Abe Miessler
Thanks a lot for these great references. That's what I assumed then...I will have to drop elegance as you say, having this UDF take 40 seconds is not an option.I hope it is going to be optimized in a next version of SQL Server though...Until then, I will probably tend to avoid CTEs...At least, with a temporary table, I can feel more in control of what's going on behind the scenes.Thanks again.
Kharlos Dominguez
Argh... I had forgotten that even the multistatement UDF can not use temp tables.I have tried using table variables instead, but the performance is absolutely terrible... even worse than than CTE version...My data subset is about 10 000 rows, but can go up to 100 000 depending on parameters supplied by the user.What are my other options considering that I *have* to return my data as table to the caller.Thanks.
Kharlos Dominguez
See my update. Unfortunately I have a feeling neither of those are going to work for you.
Abe Miessler