views:

171

answers:

3

I have an expensive scalar UDF that I need to include in a select statement and use that value to narrow the results in the where clause. The UDF takes parameters from the current row so I can't just store it in a var and select from that.

Running the UDF twice per row just feels wrong:

Select someField, 
       someOtherField, 
       dbo.MyExpensiveScalarUDF(someField, someOtherField)
from someTable
where dbo.MyExpensiveScalarUDF(someField, someOtherField) in (aHandfulOfValues)

How do I clean this up so that the function only runs once per row?

+2  A: 

I'd need a lot more detail before I could address this specific question, but two general ideas hit me right off:

(1) Can you make it a table-based function, join it in the FROM clause, and work from there?

(2) Look into the OUTER APPLY and CROSS APPLY join clauses. Essentially, they allow joins on table-based functions, where the parameters passed to the function are based on the row being joined (as opposed to a single call). Good examples of this are in BOL.

Philip Kelley
+3  A: 

Just because you happen to mention the function twice doesn't mean it will be computed twice per row. With luck, the query optimizer will computed it only once per row. Whether it does or not may depend in part on whether the UDF appears to be deterministic or nondeterministic.

Take a look at the estimated execution plan. Maybe you'll find out you're worrying about nothing.

If it's computed twice, you could try this and see if it changes the plan, but it's still no guarantee:

WITH T(someField,someOtherField,expensiveResult) as (
  select someField, someOtherField, dbo.MyExpensiveScalarUDF(someField, someOtherField)
  from someTable
)
  select * from T
  where expensiveResult in (thisVal,thatVal,theotherVal);
Steve Kass
Thanks - this dropped the execution time from around 40 seconds to a hair under 1 second.
Rob Allen
Great! Thanks for providing the feedback on how it worked out.
Steve Kass
Is there a name for this technique/syntax?
Rob Allen
The table T in the "WITH" clause is called a CTE (common table expression), but I don't know a particular name for the technique of using a CTE (or equivalent subquery) to avoid needing to repeat an expression.
Steve Kass
+2  A: 

Steve is correct - the query plan will probably not re-evaluate identical expressions if the UDF is deterministic.

However, repeating yourself is a potential maintenance problem:

WITH temp AS (
Select someField, 
       someOtherField, 
       dbo.MyExpensiveScalarUDF(someField, someOtherField) AS scalar
from someTable
)
SELECT *
FROM temp
where scalar in (aHandfulOfValues)

You can avoid it with a CTE or nested query.

Scalar UDFs are best to be avoided if at all possible for rowsets of any significant size (say a half million evaluations). If you expand it inline here (and with a CTE you won't have to repeat yourself), you'll probably find a huge performance boost. Scalar UDFs should be a last resort. In my experience you're far better off using a persisted computed column, or inline or just about any other technique before relying on a scalar UDF.

Cade Roux
Good point, Cade. At least when the scalar calculation is a single expression, it is generally far better to replace the scalar UDF with a table-valued UDF that returns (SELECT <the same thing>) and then write the query with CROSS APPLY. Seems kludgy at first, but eventually it begins to feel right, and it works well.
Steve Kass