ansaurus

Question

Bizarre performance issue: Common Table Expressions in inline User-Defined Function

Answer 1

+1 A:

This is a guess and just a guess, but perhaps it has something to do w/ how the optimizer makes a pretty good guess at the best execution plan, but does not make an exhaustive search for one.

So, query execution works like this:

parse -> bind -> optimize -> execute

The parse trees for your two queries will certainly be different. The bind trees are probably different. I don't know enough about the bind phase to state that conclusively, but assuming the bind trees are different, then it may require a different number of transforms to get the A and B bind trees to the same execution plan.

If it takes two additional transforms to get query B to the ~5ms plan, the optimizer may say "good enough" before discovering it. Whereas for query A, the ~5ms plan maybe just inside the search cost threshold.

Peter 2010-01-23 04:41:19

I think that this is about as good an answer as anyone but MS can give. If it fails in optimize, it can still execute because the parse tree produced by the bind phase *is* executable. But 1) performance is awful, because it is unoptimized, 2) it is syntax and order sensitive, because it is the optimizer that takes care of that, and 3) there *is* *no* *true* *query* *plan*, because QPs include costing and the optimizer does that too. Note that all three of these symptoms are evident in the OPs question.

RBarryYoung 2010-01-27 17:26:33

As I've now mentioned in some of the question comments, it can't simply be an inefficient plan being chosen because the estimated plan never comes back at all, and the issue is reproducible on a very small table where even a triple-cartesian-product would take less than a second. As much as I wish it did, the "lazy optimizer" explanation does not seem to hold water.

Aaronaught 2010-01-27 17:34:28

It is not an inefficient plan being choosen. It is *NO* plan being choosen and the original parse-tree has to be used. (see my comments to your question, above)

RBarryYoung 2010-01-27 19:46:36

Answer 2

A:

In the first statement, your join is

np INNER JOIN Hierarchy p
    ON p.Node = np.Node

your second statement is

Ancestors_CTE a
INNER JOIN Hierarchy p
ON p.Node = a.Node

However, a is also used as an alias for dbo.GetAncestors(c.Node.GetAncestor(1)) in the CT. Try exchanging Ancestors_CTE a with e.g. Ancestor_CTE acte, to ensure the optimizer isn't confused with the double use of a as an alias.

That said, I'm not sure how good SQL server is at appliying the correct indexes when creating a CTE. I've had problems with this before, and used table variables instead with great success.

Erik A. Brandstadmoen 2010-01-26 22:35:21

Unfortunately, that's not it - SQL Server has no problems working it out, but good eye, I've updated the question to make this explicit.

Aaronaught 2010-01-26 23:12:37

Answer 3

+2 A:

I've reproduced the behavior on SQL 2008 SP1, substituting a SQL UDF for the CLF UDF dbo.GetAncestors. I tried both a table valued function and an in-line function; neither one made a difference.

I don't know what is going on yet, but the benefit of others, I'll include my definitions below.

-- try a recursive inline UDF...
CREATE FUNCTION dbo.GetAncestors(@hierarchyid hierarchyid)
RETURNS TABLE AS RETURN (
WITH recurse AS (
    SELECT @hierarchyid AS Ancestor
    WHERE @hierarchyid IS NOT NULL
    UNION ALL
    SELECT Ancestor.GetAncestor(1) FROM recurse
    WHERE Ancestor.GetAncestor(1) IS NOT NULL
    )
SELECT * FROM recurse
)

-- ...or a table-valued UDF, it makes no difference
CREATE FUNCTION dbo.GetAncestors(@hierarchyid hierarchyid)
RETURNS @return TABLE (Ancestor hierarchyid) 
AS BEGIN
    WHILE @hierarchyid IS NOT NULL BEGIN
        INSERT @return (Ancestor)
        VALUES (@hierarchyid)
        SET @hierarchyid = @hierarchyid.GetAncestor(1)
    END             
    RETURN
END

Choose one of the definitions above, and then run this to watch it hang:

DECLARE @IDs UniqueIntTable 
INSERT @IDs SELECT ID FROM Hierarchy
RAISERROR('we have inserted %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
SELECT * FROM dbo.GoodFunction(@IDs) a
RAISERROR('we have returned %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
GO

DECLARE @IDs UniqueIntTable 
INSERT @IDs SELECT ID FROM Hierarchy
RAISERROR('we have inserted %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
SELECT * FROM dbo.BadFunction(@IDs) a
RAISERROR('we have returned %i rows.',-1,-1,@@ROWCOUNT) WITH NOWAIT
GO

The second batch never even starts. It gets past the parse stage but appears to get lost somewhere between bind and optimize.

The bodies of both functions compile to exactly the same execution plan, outside the function wrapper:

SET SHOWPLAN_TEXT ON
GO
DECLARE @IDs UniqueIntTable 
INSERT @IDs SELECT ID FROM Hierarchy
SELECT p.ID, p.Node, p.Name, p.[Level]
FROM
(
    SELECT DISTINCT a.Ancestor AS Node
    FROM Hierarchy c 
    CROSS APPLY dbo.GetAncestors_IF(c.Node.GetAncestor(1)) a
    WHERE c.ID IN (SELECT Value FROM @IDs)
) np
INNER JOIN Hierarchy p
ON p.Node = np.Node

;WITH Ancestors_CTE AS
(
    SELECT DISTINCT a.Ancestor AS Node
    FROM Hierarchy c
    CROSS APPLY dbo.GetAncestors_IF(c.Node.GetAncestor(1)) a
    WHERE c.ID IN (SELECT Value FROM @IDs)
)
SELECT p.ID, p.Node, p.Name, p.[Level]
FROM Ancestors_CTE ac
INNER JOIN Hierarchy p
ON p.Node = ac.Node


-- both return this:

    |--Nested Loops(Inner Join, OUTER REFERENCES:([p].[Node]))
         |--Compute Scalar(DEFINE:([p].[Level]=[Scratch].[dbo].[Hierarchy].[Level] as [p].[Level]))
         |    |--Compute Scalar(DEFINE:([p].[Level]=[Scratch].[dbo].[Hierarchy].[Node] as [p].[Node].GetLevel()))
         |         |--Index Scan(OBJECT:([Scratch].[dbo].[Hierarchy].[IX_Hierarchy_Node] AS [p]))
         |--Top(TOP EXPRESSION:((1)))
              |--Filter(WHERE:([Recr1005]=[Scratch].[dbo].[Hierarchy].[Node] as [p].[Node]))
                   |--Nested Loops(Inner Join, OUTER REFERENCES:([c].[Node]))
                        |--Nested Loops(Inner Join, OUTER REFERENCES:([Value]))
                        |    |--Clustered Index Scan(OBJECT:(@IDs))
                        |    |--Clustered Index Seek(OBJECT:([Scratch].[dbo].[Hierarchy].[PK_Hierarchy] AS [c]), SEEK:([c].[ID]=[Value]) ORDERED FORWARD)
                        |--Index Spool(WITH STACK)
                             |--Concatenation
                                  |--Compute Scalar(DEFINE:([Expr1011]=(0)))
                                  |    |--Constant Scan(VALUES:(([Scratch].[dbo].[Hierarchy].[Node] as [c].[Node].GetAncestor((1)))))
                                  |--Assert(WHERE:(CASE WHEN [Expr1013]>(100) THEN (0) ELSE NULL END))
                                       |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1013], [Recr1003]))
                                            |--Compute Scalar(DEFINE:([Expr1013]=[Expr1012]+(1)))
                                            |    |--Table Spool(WITH STACK)
                                            |--Compute Scalar(DEFINE:([Expr1004]=[Recr1003].GetAncestor((1))))
                                                 |--Filter(WHERE:(STARTUP EXPR([Recr1003].GetAncestor((1)) IS NOT NULL)))
                                                      |--Constant Scan

Very interesting. Submit a bug report at Microsoft Connect, have them tell you what's going on.

Peter 2010-01-28 01:26:49

Interesting that it's reproducible without the CLR function, that will make it a lot easier to submit a bug report if it comes to that. Thanks for having the patience to get your hands dirty with this one and confirming that it is not just a case of the optimizer simply giving up.

Aaronaught 2010-01-28 01:46:49

Answer 4

+5 A:

Haha, try this:

IF OBJECT_ID('_HappyFunction' ) IS NOT NULL DROP FUNCTION _HappyFunction
IF OBJECT_ID('_SadFunction'   ) IS NOT NULL DROP FUNCTION _SadFunction
IF TYPE_ID  ('_UniqueIntTable') IS NOT NULL DROP TYPE _UniqueIntTable
GO

CREATE TYPE _UniqueIntTable AS TABLE (Value int NOT NULL PRIMARY KEY)
GO

CREATE FUNCTION _HappyFunction (@IDs _UniqueIntTable READONLY)
RETURNS TABLE AS RETURN
  SELECT Value FROM @IDs
GO

CREATE FUNCTION _SadFunction (@IDs _UniqueIntTable READONLY)
RETURNS TABLE AS RETURN 
  WITH CTE AS (SELECT Value FROM @IDs)
  SELECT Value FROM CTE
GO

-- this will return an empty record set
DECLARE @IDs _UniqueIntTable 
SELECT * FROM _HappyFunction(@IDs)
GO

-- this will hang
DECLARE @IDs _UniqueIntTable 
SELECT * FROM _SadFunction(@IDs)
GO

Who would have guessed?

Peter 2010-01-28 02:35:14

That's crazy! What in the world is going on here?

Aaronaught 2010-01-28 02:41:22

It's officially on MS Connect now: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=527843. I'm marking this answer as the accepted one because it eliminated several variables from the equation (`hierarchyid`, `CROSS APPLY`, and the schema in general), it simplified the repro steps and made the case that much easier to submit. Thanks again!

Aaronaught 2010-01-28 15:34:10

Answer 5

A:

As I understand it when using CTEs in batch you must end the statement with a ";". It has something to do with the interpretation of the WITH clause. Try this:

IF OBJECT_ID('_HappyFunction' ) IS NOT NULL DROP FUNCTION _HappyFunction  
IF OBJECT_ID('_NowHappyFunction') IS NOT NULL DROP FUNCTION _NowHappyFunction  
IF TYPE_ID  ('_UniqueIntTable') IS NOT NULL DROP TYPE _UniqueIntTable  
GO  

CREATE TYPE _UniqueIntTable AS TABLE (Value int NOT NULL PRIMARY KEY)  
GO  

CREATE FUNCTION _HappyFunction (@IDs _UniqueIntTable READONLY)  
RETURNS TABLE AS RETURN  
  SELECT Value FROM @IDs  
GO  

CREATE FUNCTION _NowHappyFunction (@IDs _UniqueIntTable READONLY)  
RETURNS @Table TABLE
(
Value INT
)
BEGIN
  ;WITH CTE AS (SELECT Value FROM @IDs)
  INSERT INTO @Table
  SELECT Value FROM CTE
  RETURN
END
GO

-- this will return an empty record set  
DECLARE @IDs _UniqueIntTable   
SELECT * FROM _HappyFunction(@IDs)  
GO  

-- this will no longer hang and will also return an empty record set 
DECLARE @IDs _UniqueIntTable   
SELECT * FROM _NowHappyFunction(@IDs)  
GO

Shane Collinsworth 2010-07-20 19:14:50

That won't execute. The ; in front of the WITH statement will cause an error preventing you from creating the function.

Chris Lively 2010-07-20 19:33:46

Corrected the syntax and changed the inline function to a multiline statement function. Compiles now.

Shane Collinsworth 2010-07-23 14:48:13

ansaurus

tags:

views:

answers:

Bizarre performance issue: Common Table Expressions in inline User-Defined Function

related questions