views:

1447

answers:

5

A few examples to show, just incase:

Inline Table Valued

CREATE FUNCTION MyNS.GetUnshippedOrders()
RETURNS TABLE
AS 
RETURN SELECT a.SaleId, a.CustomerID, b.Qty
    FROM Sales.Sales a INNER JOIN Sales.SaleDetail b
        ON a.SaleId = b.SaleId
        INNER JOIN Production.Product c ON b.ProductID = c.ProductID
    WHERE a.ShipDate IS NULL
GO

Multi Statement Table Valued

CREATE FUNCTION MyNS.GetLastShipped(@CustomerID INT)
RETURNS @CustomerOrder TABLE
(SaleOrderID    INT         NOT NULL,
CustomerID      INT         NOT NULL,
OrderDate       DATETIME    NOT NULL,
OrderQty        INT         NOT NULL)
AS
BEGIN
    DECLARE @MaxDate DATETIME

    SELECT @MaxDate = MAX(OrderDate)
    FROM Sales.SalesOrderHeader
    WHERE CustomerID = @CustomerID

    INSERT @CustomerOrder
    SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty
    FROM Sales.SalesOrderHeader a INNER JOIN Sales.SalesOrderHeader b
        ON a.SalesOrderID = b.SalesOrderID
        INNER JOIN Production.Product c ON b.ProductID = c.ProductID
    WHERE a.OrderDate = @MaxDate
        AND a.CustomerID = @CustomerID
    RETURN
END
GO

Is there an advantage to using one type (ie inline or multi statement) over the other? Is there certain scenarios when one is better than the other or are the differences purely syntactical? I realise the 2 example queries are doing different things but is there a reason I would write them in that way?

Reading about them and the advantages/differences haven't really been explained.

Thanks

+1  A: 

Your examples, I think, answer the question very well. The first function can be done as a single select, and is a good reason to use the inline style. The second could probably be done as a single statement (using a sub-query to get the max date), but some coders may find it easier to read or more natural to do it in multiple statements as you have done. Some functions just plain can't get done in one statement, and so require the multi-statement version.

I suggest using the simplest (inline) whenever possible, and using multi-statements when necessary (obviously) or when personal preference/readability makes it wirth the extra typing.

Ray
Thanks for the answer. So basically, the multi-statement is only really to be used when the function is more complicated than is feasible to do in a inline function, for the sake of readability?Are there any performance benefits at all to multi-statement?
AndyC
I don't know, but I wouldn't think so. It is probably better to let sql server figure out the optimizations that you might try to make manually (by using variables, temp tables, or whatever). Though you could certainly do some performance testing to prove/disprove this in specific cases.
Ray
Many thanks again. I may look further into this when I have more time! :)
AndyC
A: 

if you are going to do a query you can join in your Inline Table Valued function like:

SELECT
    a.*,b.*
    FROM AAAA a
        INNER JOIN MyNS.GetUnshippedOrders() b ON a.z=b.z

it will incur little overhead and run fine.

if you try to use your the Multi Statement Table Valued in a similar query, you will have performance issues:

SELECT
    x.a,x.b,x.c,(SELECT OrderQty FROM MyNS.GetLastShipped(x.CustomerID)) AS Qty
    FROM xxxx   x

because you will execute the function 1 time for each row returned, as the result set gets large, it will run slower and slower.

KM
Ah, so you would say that the inline is much better in terms of performance?
AndyC
No, they both return a table, which makes your second SQL invalid as you are trying to put a table in a column.
ck
@ck, I've updated the query yo commented on. the parameters of the function used in the second function lend it to be used as a sub query, which will result in worse performance.
KM
+1  A: 

In researching Matt's comment, I have revised my original statement. He is correct, there will be a difference in performance between an inline table valued function (ITVF) and a multi-statement table valued function (MSTVF) even if they both simply execute a SELECT statement. SQL Server will treat an ITVF somewhat like a VIEW in that it will calculate an execution plan using the latest statistics on the tables in question. A MSTVF is equivalent to stuffing the entire contents of your SELECT statement into a table variable and then joining to that. Thus, the compiler cannot use any table statistics on the tables in the MSTVF. So, all things being equal, (which they rarely are), the ITVF will perform better than the MSTVF. In my tests, the performance difference in completion time was negligible however from a statistics standpoint, it was noticeable.

In your case, the two functions are not functionally equivalent. The MSTV function does an extra query each time it is called and, most importantly, filters on the customer id. In a large query, the optimizer would not be able to take advantage of other types of joins as it would need to call the function for each customerId passed. However, if you re-wrote your MSTV function like so:

CREATE FUNCTION MyNS.GetLastShipped()
RETURNS @CustomerOrder TABLE
    (
    SaleOrderID    INT         NOT NULL,
    CustomerID      INT         NOT NULL,
    OrderDate       DATETIME    NOT NULL,
    OrderQty        INT         NOT NULL
    )
AS
BEGIN
    INSERT @CustomerOrder
    SELECT a.SalesOrderID, a.CustomerID, a.OrderDate, b.OrderQty
    FROM Sales.SalesOrderHeader a 
        INNER JOIN Sales.SalesOrderHeader b
            ON a.SalesOrderID = b.SalesOrderID
        INNER JOIN Production.Product c 
            ON b.ProductID = c.ProductID
    WHERE a.OrderDate = (
                        Select Max(SH1.OrderDate)
                        FROM Sales.SalesOrderHeader As SH1
                        WHERE SH1.CustomerID = A.CustomerId
                        )
    RETURN
END
GO

In a query, the optimizer would be able to call that function once and build a better execution plan but it still would not be better than an equivalent, non-parameterized ITVS or a VIEW.

ITVFs should be preferred over a MSTVFs when feasible because the datatypes, nullability and collation from the columns in the table whereas you declare those properties in a multi-statement table valued function and, importantly, you will get better execution plans from the ITVF. In my experience, I have not found many circumstances where an ITVF was a better option than a VIEW but mileage may vary.

Thanks to Matt.

Thomas
This simply isn't true - Multi-statement functions are very very often a huge performance hit because they stop the query optimiser from using statistics. If I had $1 for every time I've seen multi-statement function use cause a very poor choice of execution plan (mostly because it usually estimates the returned row count as 1), I'd have enough to buy a small car.
Matt Whitfield
Thanks for the clear answer and comments guys.
AndyC
+1  A: 

Internally, SQL Server treats an inline table valued function much like it would a view and treats a multi-statement table valued function similar to how it would a stored procedure.

When an inline table-valued function is used as part of an outer query, the query processor expands the UDF definition and generates an execution plan that accesses the underlying objects, using the indexes on these objects.

For a multi-statement table valued function, an execution plan is created for the function itself and stored in the execution plan cache (once the function has been executed the first time). If multi-statement table valued functions are used as part of larger queries then the optimiser does not know what the function returns, and so makes some standard assumptions - in effect it assumes that the function will return a single row, and that the returns of the function will be accessed by using a table scan against a table with a single row.

Where multi-statement table valued functions can perform poorly is when they return a large number of rows and are joined against in outer queries. The performance issues are primarily down to the fact that the optimiser will produce a plan assuming that a single row is returned, which will not necessarily be the most appropriate plan.

As a general rule of thumb we have found that where possible inline table valued functions should be used in preference to multi-statement ones (when the UDF will be used as part of an outer query) due to these potential performance issues.

Paul McLoughlin
A: 

Another answer to the question which I don't think I saw above is that one tool is necessary when the other will not suffice, namely, multi-statement-table can do what an inline-table cannot in some cases. I'm not positive about all this, but here is my current situation (and how I found this post):

I have a table (rIdent) that is a copy of another table (Ident), except that rIdent increments changes. So, the primary key of Ident is not a primary key in rIdent, and is not unique. There is a Timestamp type associated with Ident, however, so the most recent version of the table can be queried. There is another table with Timestamps (rowvers) and DateTime (time) which associate a Timestamp with a Datetime. This was done in hopes that it would be simple to get a version of the table as of any date.

I wrote a stored procedure to get that information. It worked well, but I wanted to be able to filter the results without having to create a temp table every time, so I read a little, and someone said to create a Function. I'm now creating a function, and I would like to use the Inline function because then I don't have to specify column names (there are a considerable amount of columns, and I plan on doing this on 5 or 6 tables, and maintaining them).

Bottom Line

It looks like I'm going to have to create a MSTVF because I have to run a stored procedure in the function. It looks something like this right now and won't run:

CREATE FUNCTION f_ident
(
    @date DATETIME = getdate,  --requested date
    @Actual_Date = getdate
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
(
    DECLARE @Rowvers timestamp;
    EXECUTE sp_rowvers @date, @actual_date out, @rowvers out

    SELECT *
    FROM rIdent
    WHERE rowvers in (
        SELECT MAX(rowvers)
        FROM dbo.rIdent
        WHERE rowvers < @Rowvers
        GROUP BY IDENT_ID)
    ORDER BY rowvers;
)

If I tried to create this as is, there is an error on DECLARE because the function syntax expects SELECT. I'll have to create the thing as a MSTVF and declare all the column names from the table rMosaiq.

Correct me if I'm wrong; I could use the help!

Limited Atonement