views:

98

answers:

2

I have a view (actually, it's a table valued function, but the observed behavior is the same in both) that inner joins and left outer joins several other tables. When I query this view with a where clause similar to

SELECT *
FROM [v_MyView]
WHERE [Name] like '%Doe, John%'

... the query is very slow, but if I do the following...

SELECT *
FROM [v_MyView]
WHERE [ID] in 
(
    SELECT [ID]
    FROM [v_MyView]
    WHERE [Name] like '%Doe, John%'
)

it is MUCH faster. The first query is taking at least 2 minutes to return, if not longer where the second query will return in less than 5 seconds.

Any suggestions on how I can improve this? If I run the whole command as one SQL statement (without the use of a view) it is very fast as well. I believe this result is because of how a view should behave as a table in that if a view has OUTER JOINS, GROUP BYS or TOP ##, if the where clause was interpreted prior to vs after the execution of the view, the results could differ. My question is why wouldn't SQL optimize my first query to something as efficient as my second query?

EDIT

So, I was working on coming up with an example and was going to use the generally available AdventureWorks database as a backbone. While replicating my situation (which is really debugging a slow process that someone else developed, aren't they all?) I was unable to get the same results. Looking further into the query I am debugging, I realized the issue might be related to the extensive use of User Defined Scalar Valued Functions. There is heavy use of a "GetDisplayName" function that depending upon the values you pass in, it will format lastname, firstname or firstname lastname etc. If I simply omit that function and do the string formatting in the main query/TVF/view or whatever, performance is great. When looking at the execution plan, it didn't give me a clue to look at this as the issue which is why I initially ignored it.

A: 

Though I'm not SQL guru but most probably it is due to fact that in second query you are selecting only one column that makes it faster and secondly ID column seems to be some key and thus indexed. This can be the reason why it is faster the second way.

Frist Query: SELECT * FROM [v_MyView] WHERE [Name] like '%Doe, John%'

Second query: SELECT * FROM [v_MyView] WHERE [ID] in ( SELECT [ID] FROM [v_MyView] WHERE [Name] like '%Doe, John%')

TheVillageIdiot
You are right and that is what I thought. If I select ID which is in the main table of the view, and then LastUpdatedDateTime which is in a outer join of an outer join, it is a lot slower. My question is, is there something I could do in SQL to optimize it? My query returns what I need, shouldn't SQL have done this optimization internally?
Adam Carr
A: 

The scalar UDFs are very likely the issue. As soon as they go into your query you've got a RBAR execution plan. It's tolerable if they're in the SELECT but if they're being used in a WHERE or JOIN clause....

A pity because they can be very useful but they're performance killers in big SELECTs and I'd suggest trying to rewrite either the UDFs to table valued or the query to avoid the UDFs, if at all possible.

eftpotrm
These are part of the select inside the TVF. So the TVF selects from n tables and uses UDFs to format their results. Are you saying they aren't bad if I used them in my select against the TVF?
Adam Carr
I'm not quite sure I follow you? The basic point is that a scalar function forces the server to execute each row individually because it can't be optimised to a set-based solution - so, if you _have_ to use one, it should touch the minimum number of rows possible by being done after restrictions in the WHERE and JOINs. Writing SELECT * FROM Table WHERE UdfResult(Param) = 1 is to be avoided! If possible I'd move GetDisplayName out to the very final outer select, or rewrite it into set-based operations - what's it actually doing?
eftpotrm
It was calling this udf in the select of the view. So, when my where clause against the view was against a column that was generated by a udf, I got slow performance. Changing the view to implement the udf logic inline increases performance dramatically.
Adam Carr
Sounds about right - if the op can be inlined into the query it's normally hugely faster to do so (if annoying from a code reuse perspective!) than to put a UDF in a WHERE clause. Glad it's working well for you now.
eftpotrm