views:

50

answers:

4

I have this query in MS SQL which is acting very weird (at least from my perspective).

I've user defined function called: dbo.NajblizszaDataWyceny(3, '2010-02-05') which is simple check for TOP 1 entry in one table joined with couple others. The query itself takes like milliseconds so it's not a big problem, but i show the function anyway.

CREATE FUNCTION [dbo].[NajblizszaDataWyceny] (@idPortfela INT, @dataWaluty DATETIME)
RETURNS DATETIME
AS BEGIN
RETURN (

SELECT TOP 1         [WycenaData]
FROM    [BazaZarzadzanie].[dbo].[Wycena] t1
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciPortfeleKonta] t3
    ON t1.[KlienciPortfeleKontaID] = t3.[KlienciPortfeleKontaID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciPortfele] t4
    ON t3.[PortfelID] = t4.[PortfelID]
WHERE   [WycenaData] <= @dataWaluty  AND [t3].[PortfelID] = @idPortfela
ORDER BY [WycenaData] DESC)
END

When i use this function in following way:

DECLARE @dataWyceny DATETIME
SET @dataWyceny = dbo.NajblizszaDataWyceny(3, '2010-02-05') 

SELECT  t1.[KlienciPortfeleKontaID],
    t4.[PortfelIdentyfikator] AS 'UmowaNr',
    t5.[KlienciRachunkiNumer],
    [WycenaData],
    t2.[InISIN] AS 'InstrumentISIN',
    t2.[InNazwa] AS 'InstrumentNazwa',
    [WycenaWartosc]
FROM    [BazaZarzadzanie].[dbo].[Wycena] t1
    LEFT JOIN [BazaZarzadzanie].[dbo].[Instrumenty] t2
    ON t1.[InID] = t2.[InID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciPortfeleKonta] t3
    ON t1.[KlienciPortfeleKontaID] = t3.[KlienciPortfeleKontaID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciPortfele] t4
    ON t3.[PortfelID] = t4.[PortfelID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciRachunki] t5
    ON t3.[KlienciRachunkiID] = t5.[KlienciRachunkiID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[WycenaTyp] t6
    ON t1.[WycenaTyp] = t6.[WycenaTyp]
WHERE   WycenaData = @dataWyceny     AND t3.[PortfelID] = 3
ORDER BY t5.[KlienciRachunkiNumer],
    WycenaData

it takes 1 second to run. But when i put the user function directly in WHERE so it looks like:

SELECT  t1.[KlienciPortfeleKontaID],
    t4.[PortfelIdentyfikator] AS 'UmowaNr',
    t5.[KlienciRachunkiNumer],
    [WycenaData],
    t2.[InISIN] AS 'InstrumentISIN',
    t2.[InNazwa] AS 'InstrumentNazwa',
    [WycenaWartosc]
FROM    [BazaZarzadzanie].[dbo].[Wycena] t1
    LEFT JOIN [BazaZarzadzanie].[dbo].[Instrumenty] t2
    ON t1.[InID] = t2.[InID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciPortfeleKonta] t3
    ON t1.[KlienciPortfeleKontaID] = t3.[KlienciPortfeleKontaID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciPortfele] t4
    ON t3.[PortfelID] = t4.[PortfelID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[KlienciRachunki] t5
    ON t3.[KlienciRachunkiID] = t5.[KlienciRachunkiID]
    LEFT JOIN [BazaZarzadzanie].[dbo].[WycenaTyp] t6
    ON t1.[WycenaTyp] = t6.[WycenaTyp]
WHERE   WycenaData = dbo.NajblizszaDataWyceny(3, '2010-02-05')      AND t3.[PortfelID] = 3
ORDER BY t5.[KlienciRachunkiNumer],
    WycenaData

It takes 1.5minute to finish. Can anyone explain why this is happening?

A: 

Using it in the where clause could be causing the optimiser to use a couple of Table Scans, which will impact on you execution time.

What i would suggest is to compare execution plans for both pieces of SQL to see if you can spot where the time is being lost.

kevchadders
+2  A: 

In the second code sample, the function is called for each row in the resulting join table. There would be many of these.

In the first, it is only ever called once.

Oded
+7  A: 

Functions are not assumed to be pure in SQL Server which means that the query optimiser will not cache the results of a function and re-use it; the function will be called every time it is referenced. This is even true for simple functions which just return numbers (as we found out to our cost on a project where we were using functions to emulate constants...).

So in the first version the function is called once when you call it, and the result is cached manually and re-used in the query. However in the second version the function will be called for each row when the WHERE clause tries to match the row. If you have a lot of rows, then a few milliseconds per row starts to add up.

(Note also, that your queries are semantically different. In the first query you're saying "where things are the same as the result of the function I evaluated at the start" and in the second one you're saying "where things are the same as the result of the function I'm evaluating at this specific instance in time as I consider the row". As your function uses a SELECT statement then - depending on the transaction isolation level - it could possibly return different results for different rows if the underlying data changes.)

Greg Beech
in addition to calling the function all over again, an index also cannot be used anymore. That may have an even bigger impact.
Thilo
Thanks this really explains a lot.
MadBoy
A: 

The database server apparently is not clever enough to decide that it can evaluate the function just once and then use it as a constant in an index.

Is this an older version of MS SQL?

Also, you may need to somehow declare the function to be deterministic (returning the same value for the same input) if MS-SQL has such an option.

Update: Just saw that your function "is simple check for TOP 1 entry in one table joined with couple others." This means that the function is not deterministic, and not independent from database data. There is not way the optimizer will be able to speed this up.

Thilo
This is MS SQL 2008, i will be running it on 2005 too.
MadBoy