ansaurus

Question

Subquery Caching with Sql Server 2008

Answer 1

A:

As far as I know you would need to use either a temp table or table variable for this. A comparison of the two is here.

The below uses the OUTPUT clause to fill the table variable and select from it in one statement.

declare @MatchingResults table
(
EmployeeId int primary key --Other Columns
)

INSERT INTO @MatchingResults
OUTPUT INSERTED.*
SELECT EmployeeId  --Other Columns
FROM   Employees
WHERE  BossId = 1


SELECT * 
FROM   CostCenters
WHERE  EmployeeId IN (
    SELECT EmployeeId 
    @MatchingResults))

Martin Smith 2010-06-24 16:15:21

How do I use a table variable? Am I going to gain any performance?

Martin 2010-06-24 16:17:02

@Martin - See edit. Yes you will likely gain performance but look at the execution plan. Sometimes temporary tables can perform better in cases where statistics make a difference.

Martin Smith 2010-06-24 16:29:57

Answer 2

A:

The best solution I can think of is to go with CTE

http://msdn.microsoft.com/en-us/library/ms190766.aspx

Raj More 2010-06-24 16:16:36

That isn't materialised though (most of the time). I thought the same until Quassnoi corrected me http://stackoverflow.com/questions/2855940/use-of-with-clause-in-sql-server/2856674#2856674

Martin Smith 2010-06-24 16:34:07

Answer 3

A:

Table Variables are your best option. You can also improve performance by using the exists operator for the subquery rather than in:

-- obviously the columns should match your Employees table
declare @results table (
    employeeId int,
    column1 varchar,
    column2 int
)

insert into @results
select * from Employees
where BossId = 1

-- using exists/not exists performs much better than in
select * from CostCenters
where exists ( select 0
               from @results as r
               where CostCenters.employeeId = r.employeeId )

Justin Niessner 2010-06-24 16:18:50

Answer 4

+1 A:

Caching the data of the first query will probably NOT result in better performance. When SQL Server receives the query it breaks it down to simple steps, chooses the proper indexes and operators and retrieves the data using those indexes. By storing the first query's data in a table variable or temporary table you are preventing SQL Server from using any indexes on the Employees table.

If you rewrite your query to its equivalent using JOIN it's easier to see what happens

SELECT c.* 
FROM   CostCenters c INNER JOIN Employees e on c.EmployeeId=e.EmployeeId
WHERE e.BossId=1

When SQL Server sees this query it will check the statistics of the tables. If BossId is a highly selective indexed column it may first try to filter by this. Otherwise it will use any indexes on the EmployeeId columns to limit rows from both tables to a minimum, then BossId to find the proper rows and return them.

Filtering operations on indexes are quite fast as the indexes contain only a subset of the row data, are easier to cache in memory and have a physical structure that allows quick searching.

You really shouldn't try to second-guess SQL Server's query optimizer before you encounter an actual performance problem. Most of the time you will prevent it from selecting the best execution plan and result in worse performance

Panagiotis Kanavos 2010-06-24 16:53:59

+1 There is a cost benefit analysis that needs to be done. (deleted previous rambling comment as I'm off out and don't have time to fine tune it.)

Martin Smith 2010-06-24 17:27:03

Answer 5

+2 A:

You can cache CTEs by reusing the query plan. This requires injecting the Eager Spool between the resultset produced by the function. Quassnoi makes use of it in this article, but I can't find a better example at this time. Here's another good read on Eager Spool.

OMG Ponies 2010-06-24 17:07:23

ansaurus

tags:

views:

answers:

Subquery Caching with Sql Server 2008

related questions