views:

111

answers:

4

In the query below is it better for performance to move the case statement to a UDF? If so why?

SELECT id,
       name,
       case
         when v1 = 'Y' then 'something'
         when v2 = 'K' then 'something else'
         when v3 is null then 'dont know'
         else 'default'
       end
from table
+2  A: 

No I would leave it just like it is. Using UDFs can (in general) have some wierd side effects and limitations and this query simply doesn't justify using them. UDF performance should be no better than what you get from this direct query.

Here's a relevant blog entry: http://www.bennadel.com/blog/964-SQL-User-Defined-Functions-Are-Slower-Than-Inline-Logic.htm

Michael Bray
good stuff. thanks for the info
Abe Miessler
The only weird side effect of a UDF can possibly be performance degradation because you're not allowed to make any changes to the data from the UDF in any way. In other words, UDFs can only perform selects and calculations and not inserts/updates.
Paul Sasik
I added a link that describes some other wierd limitations of UDFs (such as you can't use non-deterministic built-in functions such as GETDATE)
Michael Bray
+1  A: 

A UDF would usually perform worst then a case statement in a query. But with most SQL, it is best to try both and compare the results.

Irwin M. Fletcher
A: 

It will probably not have any effect if you move into a separate UDF. Sql Server optimizes these types of operations for performance and your UDF would probably get "inlined" anyway.

Paul Sasik
Thats interesting, it will inline the code somehow? Do you have reference for that?
AnthonyWJones
A: 

By the way (sorry, I can't remember where I read this) the optimizer has a chance of getting a better execution plan when using the "short" syntax form of the case statement. Even though ultimately the short form is just syntactic sugar, it gives the optimizer a hint that all comparisons are simple equality-based without complex expressions being involved.

Your case statement as given can't be switched around because you're using 3 different columns, v1, v2, and v3. But if you were using all the same column, say v1, then this modification could at times perform better:

SELECT
   id,
   name,
   case Coalesce(v1, 'dont know')
      when 'Y' then 'something'
      when 'K' then 'something else'
      when 'dont know' then 'dont know'
      else 'default'
   end
from table

Okay, maybe this isn't ideal because of the conniptions required for null handling. I just wanted to point out a possible optimization that can be made in some cases, since you are looking for general optimization hints.

For what it's worth, I completely agree that a UDF will perform worse. Inlining nearly universally gives a performance improvement because it avoids all the work necessary to make a call and return from the call (unless the optimizer secretly inlines it for you).

Another idea in this specific case is that you might consider using a join with a derived table full of literal values:

SELECT
   id,
   name,
   coalesce(x.result, 'dont know')
from
   table t
   left join (
       select 'Y', 'something'
       union all select 'K', 'something else'
       union all select '%', 'default'
   ) x (value, result) on t.v1 like x.value

Again, this may not be perfect because of the way I chose to do the null handling with LIKE. However, it adequately demonstrates the technique.

Additional points:

  • Testing is always in order to find out what gives the best performance in any situation. Stuff a table full of 100,000 or a million rows and do some tests with profiler running to capture the cpu, reads, writes, and duration. Prefer lower cpu/reads over lower duration.

  • While execution plans are often good guides, don't absolutely trust execution plan costs because they completely ignore the internal costs of UDFs and do not always expose cpu/read tradeoffs properly.

  • It's usually better to spend a little cpu if it avoids even a handful of reads. The "constant scan" that shows up in execution plans for derived tables like I used above will always have minuscule cost. This cost is far less than almost any number of reads involving disk access.

Emtucifor