views:

1329

answers:

2

This question grew out of SQLServer: Why avoid Table-Valued User Defined Functions?. I began asking questions in some of the comments, and the replies to my comments moved off topic.


So that you don't have to read the entire discussion: I had never heard it said that user defined functions (UDF) were slow, or to be avoided. Some links were posted in the question referenced above to illustrate that they were slow. I still didn't get it, and asked for an example. An example was posted, and the performance difference was huge.

I can't be the only person who did not realize there could be such a large performance difference. I felt this fact should be separated into a new question and answer, to improve its chances of being found. This here is the "question". Please don't close yet, as I'd like to give the answerer time to post the answer.

Of course, others should also post answers or examples, as well. I'd especially appreciate anything that would help me understand why the performance difference is so huge.

Note also that I'm not talking about the use of a UDF in a WHERE clause. I'm aware of how this can prevent the optimizer from doing its job. I'm specifically interested in differences in performance when the original UDF was part of the SELECT column list.

+4  A: 

For benchmarking let's create a table with 1M rows:

CREATE TABLE dbo.Numbers(n INT NOT NULL PRIMARY KEY)
GO
DECLARE @i INT;
SET @i = 1;
INSERT INTO dbo.Numbers(n) SELECT 1;
WHILE @i<1024000 BEGIN
  INSERT INTO dbo.Numbers(n)
    SELECT n + @i FROM dbo.Numbers;
  SET @i = @i * 2;
END;
GO

Run simple inline adding:

SELECT COUNT(*) FROM(
SELECT n,n+1 AS ValuePlusOne
FROM  dbo.Numbers
) AS t WHERE ValuePlusOne>0

   CPU time = 15 ms, elapsed time = 122 ms.

(1 row(s) affected)
Table 'Numbers'. Scan count 1, logical reads 3521, physical reads 3, read-ahead reads 3498, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 406 ms,  elapsed time = 951 ms.

Create a scalar UDF that just adds one to an integer, and run it 1M times:

CREATE FUNCTION dbo.[AddOne] 
(
        @value int
)
RETURNS int
AS
BEGIN
        DECLARE @Result int
        SELECT @Result = @value + 1
        RETURN @Result
END
GO

SELECT COUNT(*) FROM(
SELECT n,dbo.AddOne(n) AS ValuePlusOne
FROM  dbo.Numbers
) AS t WHERE ValuePlusOne>0

   CPU time = 15 ms, elapsed time = 122 ms.

(1 row(s) affected)
Table 'Numbers'. Scan count 1, logical reads 3521, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 108313 ms,  elapsed time = 295072 ms.

Create an inline UDF, which is just as fast as just adding, and run that 1M times:

CREATE FUNCTION dbo.[AddOneInline] 
(
        @value int
)
RETURNS TABLE
AS
RETURN(SELECT @value + 1 AS ValuePlusOne)
GO

SELECT COUNT(*) FROM(
SELECT ValuePlusOne
FROM  dbo.Numbers
CROSS APPLY dbo.[AddOneInline](n)
) AS t WHERE ValuePlusOne>0

SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 35 ms.

(1 row(s) affected)
Table 'Numbers'. Scan count 1, logical reads 3521, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 391 ms,  elapsed time = 403 ms.

The difference in performance of a scalar UDF vs. an inline one is obvious.

AlexKuznetsov
A: 

Well, now that you opened a thorny subject :-) I think we need more realistic examples and to refrain from gotcha games. Examples that look too contrived always make me suspicious. So, I rearranged the query just a bit and straight scalar UDF outperformed the query. Dont' believe it - try it - this was on SQL 2k8 on a dev box under 2k8 Server Std.

All we have learned so far is that using computed columns and equivalents in WHERE clauses is bad. That query was using scalar function in WHERE clause while pretending that it's in a select.

SELECT COUNT(*) FROM( 
SELECT n as X,n+1 AS ValuePlusOne 
FROM  dbo.Numbers 
) AS t WHERE X>0 

Table 'Numbers'. Scan count 1, logical reads 3521, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 234 ms, elapsed time = 228 ms.

SELECT COUNT(*) FROM( 
SELECT n as X ,dbo.AddOne(n) AS ValuePlusOne 
FROM  dbo.Numbers 
) AS t WHERE X>0 

Table 'Numbers'. Scan count 1, logical reads 3521, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 202 ms, elapsed time = 215 ms.

So, now that we resolved that how about some real info and realistic use cases ?

I'll supply 2 for debate :-) but please remember no contrived gotchas. A TVF and a scalar UDF just calling it to get the value in a convenient way and then use in a query either as a value or in a join - no one's calculating anything. Can someone construct a table or state how pathological data would have to be in order to see perf diff between LCID1 and LCID2 ?

CREATE FUNCTION [PublishingCulture]  ( @XLanguage int,
                                 @XLocale int 
) RETURNS TABLE 
AS
RETURN 
(
    select TOP 1 * from [Culture] C
    where ((C.XLang = @XLanguage and C.XLoc = @XLocale)
      or   (C.XLang = @XLanguage and C.XLoc  = 0)
      or   (C.XLang = 0 and C.XLoc = @XLocale)
      or   (C.XLang = 0 and C.XLoc = 0))
)

CREATE FUNCTION [MyLCID1] ( @XLanguage int,
                      @XLocale int )
RETURNS TABLE
AS
     RETURN ( SELECT LCID from dbo.PublishingCulture(@XLanguage, @XLocale) )

CREATE FUNCTION [MyLCID2] ( @XLanguage int,
                      @XLocale int )
RETURNS int
AS
BEGIN
    RETURN ( SELECT LCID from dbo.PublishingCulture(@XLanguage, @XLocale) )
END

select * from 
   (select Row_number() OVER(order by StartDate) as RN, Message 
    from [Ticker] as T
    join dbo.MyLCID1(@XLanguage, @XLocale) as L on T.LCID = L.LCID
    where
      Getutcdate() BETWEEN StartDate AND EndDate
   ) AS T
where RN BETWEEN @StartIndex AND (@StartIndex + @MaxItems -1)

select * from 
   (select Row_number() OVER(order by StartDate) as RN, Message 
    from [Ticker] as T
    where
        LCID = dbo.PubLCID1(@XLanguage, @XLocale) AND
   Getutcdate() BETWEEN StartDate AND EndDate
   ) AS T
where RN BETWEEN @StartIndex AND (@StartIndex + @MaxItems -1)

[Culture] has PK on XLang,Xloc, [Ticker] has PK on LCID,Id (Id is artificial) and IX on StartDare,EndDate,LCID -- as close to something real as one can get in a few lines.

ZXX