I'm noticing something a bit unexpected with how SQL Server (SQL Server 2008 in this case) treats correlated subqueries within a select statement. My assumption was that a query plan should not be affected by the mere order in which subqueries (or columns, for that matter) are written within the projection clause of the select statement. However, this does not appear to be the case.
Consider the following two queries, which are identical except for the ordering of the subqueries within the CTE:
--query 1: subquery for Color is second
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
--query 2: subquery for Color is first
WITH vw AS
(
SELECT p.[ID],
(SELECT TOP(1) [Color] FROM [Preference] WHERE p.ID = ID AND [Color] IS NOT NULL ORDER BY [LastModified] DESC) [Color],
(SELECT TOP(1) [FirstName] FROM [Preference] WHERE p.ID = ID AND [FirstName] IS NOT NULL ORDER BY [LastModified] DESC) [FirstName]
FROM Person p
)
SELECT ID, Color, FirstName
FROM vw
WHERE Color = 'Gray';
If you look at the two query plans, you'll see that an outer join is used for each subquery and that the order of the joins is the same as the order the subqueries are written. There is a filter applied to the result of the outer join for color, to filter out rows where the color is not 'Gray'. (It's odd to me that SQL would use an outer join for the color subquery since I have a non-null constraint on the result of the color subquery, but OK.)
Most of the rows are removed by the color filter. The result is that query 2 is significantly cheaper than query 1 because fewer rows are involved with the second join. All reasons for constructing such a statement aside, is this an expected behavior? Shouldn't SQL server opt to move the filter as early as possible in the query plan, regardless of the order the subqueries are written?
Edit: Just to clarify, there is a valid reason I'm exploring this scenario. I may need to create a view that involves similarly constructed subqueries, and it is now apparent that any filtering based on these columns projected from the view will vary in performance just because of the ordering of the columns!