views:

46

answers:

1

I have a question about how MS SQL evaluates functions inside CTEs. A couple of searches didn't turn up any results related to this issue, but I apologize if this is common knowledge and I'm just behind the curve. It wouldn't be the first time :-)

This query is a simplified (and obviously less dynamic) version of what I'm actually doing, but it does exhibit the problem I'm experiencing. It looks like this:

CREATE TABLE #EmployeePool(EmployeeID int, EmployeeRank int);

INSERT INTO #EmployeePool(EmployeeID, EmployeeRank)
  SELECT 42, 1
  UNION ALL 
  SELECT 43, 2;

DECLARE @NumEmployees int;
SELECT @NumEmployees  = COUNT(*) FROM #EmployeePool;

WITH RandomizedCustomers AS (
  SELECT CAST(c.Criteria AS int) AS CustomerID,
         dbo.fnUtil_Random(@NumEmployees) AS RandomRank
    FROM dbo.fnUtil_ParseCriteria(@CustomerIDs, 'int') c)
SELECT rc.CustomerID,
       ep.EmployeeID
  FROM RandomizedCustomers rc
  JOIN #EmployeePool ep ON ep.EmployeeRank = rc.RandomRank;

DROP TABLE #EmployeePool;

The following can be assumed about all executions of the above:

  • The result of dbo.fnUtil_Random() is always an int value greater than zero and less than or equal to the argument passed in. Since it's being called above with @NumEmployees which has the value 2, this function always evaluates to 1 or 2.

  • The result of dbo.fnUtil_ParseCriteria(@CustomerIDs, 'int') produces a one-column, one-row table that contains a sql_variant with a base type of 'int' that has the value 219935.

Given the above assumptions, it makes sense (to me, anyway) that the result of the expression above should always produce a two-column table containing one record - CustomerID and an EmployeeID. The CustomerID should always be the int value 219935, and the EmployeeID should be either 42 or 43.

However, this is not always the case. Sometimes I get the expected single record. Other times I get two records (one for each EmployeeID), and still others I get no records. However, if I replace the RandomizedCustomers CTE with a true temp table, the problem vanishes completely.

Every time I think I have an explanation for this behavior, it turns out to not make sense or be impossible, so I literally cannot explain why this would happen. Since the problem does not happen when I replace the CTE with a temp table, I can only assume it has something to do with the functions inside CTEs are evaluated during joins to that CTE. Do any of you have any theories?

+1  A: 

SQL Server's optimizer is free to decide whether to reevaluate a CTE or not.

For instance, this query:

WITH    q AS
        (
        SELECT  NEWID() AS n
        )
SELECT  *
FROM    q
UNION ALL
SELECT  *
FROM    q

will produce two different NEWID()'s, however, if you use cached XML plan to wrap the CTE into an Eager Spool operation, the records will be same.

Quassnoi
I totally understand now. Thank you very much for your help!
Jammer