ansaurus

Question

Impact of ordering of correlated subqueries within a projection

Answer 1

+2 A:

With the TOP operator coming into play here, the Query Optimizer is remarkably blind about the statistics, so it will look for other clues about how best to work it, such as instantiating relevant parts of the CTE first.

And it's an outer join because the subquery will be used as NULL if nothing is returned, and the system is instantiating it first. If you were using an aggregate instead of TOP, you'd probably get a slightly different but more consistent plan.

Rob Farley 2010-03-30 21:31:10

You may prefer to use row_number to get your result, as it can be an effective replacement for TOP.

Rob Farley 2010-03-30 21:34:51

Yes, I understand your point on statistics, but shouldn't the optimizer make the assumption that any filtering at worst leaves all rows, but at best removes many of the rows? Also, if you construct a more typical query using an outer join and then put a non-null constraint on a joined column, the optimizer is smart enough to use an inner join. Why not in this case?

Michael Petito 2010-03-30 21:41:02

Also I can't easily use aggregation in this case because there is no null coalescing aggregate in SQL.

Michael Petito 2010-03-31 02:22:14

A scalar subquery in the select clause can never filter the set of data, so therefore it's implemented as an outer join. The fact that you're filtering it elsewhere means that it could potentially filter out NULLs first by doing an inner join, but there's no advantage to doing this, and the QO chooses to use an outer join (which is generally quicker, as the system doesn't have to delete rows from its working set).

Rob Farley 2010-03-31 06:46:59

I know you can't use aggregates in this particular case. It would be nice if T-SQL had implemented a FIRST(Field) OVER (ORDER BY ...) function, but it doesn't have that yet.

Rob Farley 2010-03-31 06:47:50

Yes, a FIRST windowed function would obviate the need for subqueries or joins entirely and probably perform best. I wonder if that will ever become part of T-SQL?

Michael Petito 2010-03-31 12:29:09

Answer 2

+1 A:

Here is an alternate version that might perform better:

With Colors As
    (
    Select Id, [Color]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
    From Preference
    Where [Color] Is Not Null
    )
    , Names As
    (
    Select Id, [FirstName]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY [LastModified] DESC ) As Num
    From Preference
    Where [FirstName] Is Not Null
    )
Select
From Person As P
    Join Colors As C
        On C.Id = P.Id
            And C.Num = 1
    Left Join Names As N
        On N.Id = P.Id
            And N.Num = 1
Where C.[Color]= 'Grey'

Another solution which is more concise but may or may not perform as well:

With RankedItems
    (
    Select Id, [Color], [FirstName]
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [Color] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As ColorRank
        , ROW_NUMBER() OVER ( PARTITION BY ID ORDER BY Case When [FirstName] Is Not Null 1 Else 0 End DESC, [LastModified] DESC ) As NameRank
    From Preference
    )
Select
From Person As P
    Join RankedItems As RI
        On RI.Id = P.Id
            And RI.ColorRank = 1
    Left Join RankedItems As RI2
        On RI2.Id = P.Id
            And RI2.NameRank = 1
Where RI.[Color]= 'Grey'

Thomas 2010-03-31 03:49:44

Thanks Thomas, I'm examining your first query now and it looks pretty good. So far it seems the generated plan is both independent of order and the joins that are filtered on are always performed first and as inner joins.

Michael Petito 2010-03-31 14:03:43

Your first query also seems to scale better than my original query in terms of IO performance since SQL scans for subsequent joins instead of using repeated index seeks. The second query is more concise (though not as clear), but unfortunately it requires expensive sorts on the result of the case statements. I can easily avoid the sorting required in the first query by clustering on ID, LastModified DESC.

Michael Petito 2010-03-31 14:45:43

ansaurus

tags:

views:

answers:

Impact of ordering of correlated subqueries within a projection

related questions