views:

194

answers:

2

The following queries are taking 70 minutes and 1 minute respectively on a standard machine for 1 million records. What could be the possible reasons?

Query [01:10:00]

SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
    CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')  
     THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')  
     ELSE sys.fn_cdc_increment_lsn(0x00) END
    , sys.fn_cdc_get_max_lsn()
    , 'all with mask') 
WHERE __$operation <> 1

Modified Query [00:01:10]

DECLARE @MinLSN binary(10)
DECLARE @MaxLSN binary(10)
SELECT @MaxLSN= sys.fn_cdc_get_max_lsn()
SELECT @MinLSN=CASE WHEN sys.fn_cdc_increment_lsn(0x00)<sys.fn_cdc_get_min_lsn('dbo_PartitionTest')  
     THEN sys.fn_cdc_get_min_lsn('dbo_PartitionTest')  
     ELSE sys.fn_cdc_increment_lsn(0x00) END

SELECT * 
FROM cdc.fn_cdc_get_net_changes_dbo_PartitionTest(
     @MinLSN, @MaxLSN, 'all with mask') WHERE __$operation <> 1


[Modified]

I tried to recreate the scenario with a similar function to see if the parameters are evaluated for each row.

CREATE FUNCTION Fn_Test(@a decimal)RETURNS TABLE
AS
RETURN
(
    SELECT @a Parameter, Getdate() Dt, PartitionTest.*
    FROM PartitionTest
);

SELECT * FROM Fn_Test(RAND(DATEPART(s,GETDATE())))

But I am getting the same value for the column 'Parameter' for a a million records processed in 38 seconds.

+6  A: 

In your first query, your fn_cdc_increment_lsn and fn_cdc_get_min_lsn get executed for every row. In second example, just once.

Rubens Farias
But they are just parameters to the function. Why is it getting evaluated for every row. I thought the same when I first observed this. Then I tried testing the case with a similar function [Updated the question with details] but it finishes in same time in both the cases.
Faiz
True, but the function needs to be evaluated for every row in the output.
IronGoofy
Can anyone please answer 'why' ?
Faiz
Why? Because even deterministic scalar functions are evaluated at least once per row. Deterministic scalar function with identical parameters on a row will only be evaluated once per row, but will still be evaluated on subsequent rows even if called with the same parameters.
Cade Roux
I think you are missing a point that, function I am calling is a table valued function. So to the best of my understanding, there is no concept of 'evaluating for every row'. Your argument is correct if I am using a scalar function like "Select Fn_Scalar(Param) From Table1" then for each row in table Table1 the function will be evaluated. In my case this is a table valued function. If I get it correct the execution order will be, Calculate Parameter from expression >> Execute function >> Apply filter on result set >> Select specified columns; for the above query. Tell me if I missed anything?
Faiz
Faiz, in your second case, a single random value is calculated and passed into the table function, so you get that same value over and over. The execution plan is unrelated to the one in your earlier problem.
Cade Roux
+2  A: 

Even deterministic scalar functions are evaluated at least once per row. If the same deterministic scalar function occurs multiple times on the same "row" with the same parameters, I believe only then will it be evaluated once - e.g. in a CASE WHEN fn_X(a, b, c) > 0 THEN fn_X(a, b, c) ELSE 0 END or something like that.

I think your RAND problem is because you continue to reseed:

Repetitive calls of RAND() with the same seed value return the same results.

For one connection, if RAND() is called with a specified seed value, all subsequent calls of RAND() produce results based on the seeded RAND() call. For example, the following query will always return the same sequence of numbers.

I have taken to caching scalar function results as you have indicated - even going so far as to precalculate tables of scalar function results and joining to them. Something has to be done eventually to make scalar functions perform. Right not, the best option is the CLR - apparently these far outperform SQL UDFs. Unfortunately, I cannot use them in my current environment.

Cade Roux
But reseeding RAND() ensures that we get random values right? By feeding second value of time as seed, I think, I am ensuring that.
Faiz
You're not reseeding RAND(). You are passing in a single value for it, and it's returning a single value, and that single value is then being passed into your function.
Tom H.
Sorry, when I first read the usage of RAND(), I thought you were doing SELECT fn(RAND(seed)) from tbl, which of course is the same as SELECT fn(a_number) FROM tbl since you are reseeding with the same value. In any case, the information regarding scalar function behavior is the correct interpretation of what's going on it your case.
Cade Roux
@Tom H: I am passing the "current second" value to RAND() to reseed. Do you meant to say that is not working? If the parameter expression is evaluated each and every time, then new seeds will be given to RAND() function as time passes, right?
Faiz