views:

256

answers:

3

Hi SO, I am currently working on a particularly complex use-case. Simplifying below :)

First, a client record has a many-to-one relationship with a collection of services, that is, a single client may have multiple services associated with it.

Within my trigger, I am writing a Query that returns a client's id based on certain criteria. The criteria are as follows,

  1. If at least one service is of type B, and no services of type A exist, return id
  2. If at least one service is of type C, and no services of type B or A exist, return id
  3. If at least one service is of type D, and no services of type C or B or A exist, return id

and my current approach is to form a query similar to the one below

SELECT c.ClientId
FROM
  Clients AS c
    -- actually INNER JOIN is superfluous in this sample, but required for
    -- other auxilliary criteria i have left out. illustrates relationship
    -- between Clients and Services table
    INNER JOIN Services AS s ON c.ClientId = s.ClientId
WHERE
-- has at least one service of type B, no A
(EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'B')) AND
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'A'))) OR 

-- has at least one service of type C, no B, no A
(EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'C')) AND
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'B')) AND
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'A'))) OR

-- has at least one service of type D, no C, no B, no A
(EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'D')) AND
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'C')) AND
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'B')) AND
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'A')))

where [dbo].[Get_ServicesByClientIdAndType] is a function that returns associated services for specified client id and service type. Similar to

-- this query is actually significantly more complex than shown
-- below, but this illustrates use of parameters client id and
-- service type
SELECT s.ServiceType
FROM
  Services AS s
WHERE
  s.ClientId = @clientId AND
  s.ServiceType = @serviceType


Assuming this is optimal means of expressing this use-case, would function [dbo].[Get_ServicesByClientIdAndType] sub-query be cached or does changing service parameter necessitate a new evaluation each invocation? [i am invoking this thing like 9 times!!! running Sql Server 2005]

I know Sql Server 2005 supports some sub-query optimizations, like caching results, but I don't know for certain under what circumstances or how to form my sub-queries [or function] such that I make the most of Sql Server's capabilities.


EDIT: reviewed my criteria above, and couldn't let go of a nagging feeling something was off. I played around with some logic in my head, and came up with this [much simpler] formulation

SELECT c.ClientId
FROM
  Clients AS c
    INNER JOIN Services AS s ON c.ClientId = s.ClientId
WHERE
  NOT EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'A')) AND
    (EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'B')) OR 
    EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'C')) OR 
    EXISTS (SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'D')))

essentially, there exists no scenario involving B that would lead to rejection, similarly for C and D, so any configuration is acceptable. we only care that A is not present in any selection. Arg! Charlie Brown!


leaving both expressions up for review, and I still very much appreciate the responses regarding Sql Server's performance wrt user defined functions.

+1  A: 

I would guess that sql server calls your function Get_ServicesByClientIdAndType once for each combination of parameter values but that for every row in the Clients table. You have three combinations of values, so for 100 rows in Client table you might see 300 calls of the function.

But to be confident, run the query in sql server management studio and switch on the option "show execution plan". This way you can easily detect which part of your query consumes the most resources and conentrate on optimizing that part.

Jan
A: 

One thing to keep in mind is to avoid "NOT" if at all possible. "NOT" is non-sargable, it won't be able to take full advantages of indexing. At first glance, I don't see a way to rewrite it to avoid the NOT expressions though. FWIW, YMMV. :-)

Brian Knoblauch
+1  A: 

Hello johnny.

I was writing an answer for your question and in the meantime you have changed your requirements but you should not have any problems to convert my solution to your specific needs..

But let me start from the begining. I am pretty sure that SELECT * FROM Get_ServicesByClientIdAndType (c.ClientId, 'A') is not cached in anyway by the Server. It is not that smart ;) So it is calculated multiple times in your main query.

So your first optimization should go in that direction. You should reduce the number of times when Get_ServicesByClientIdAndType is called. You can do it in many ways. But the general rule is that that you should calculate all the possible results of this function for all of your clients. Those results should be put in some temporarty table or they will be puted into a virtual table whis is made by SQL Server itself.

When you got your all possible results you simply join them with your clients table. But you JOIN them only ONCE.

Of course many things and optimization trick depends on your real example. In the example you have given there is even no need for usage Get_ServicesByClientIdAndType. Why not simply join those two tables and perform some calculations on them?

Take a look at this query:

SELECT A.* FROM
(
 SELECT C.ClientID,
  SUM(CASE(S.ServiceType) WHEN 'A' THEN 1 ELSE 0 END) AS ServiceA,
  SUM(CASE(S.ServiceType) WHEN 'B' THEN 1 ELSE 0 END) AS ServiceB,
  SUM(CASE(S.ServiceType) WHEN 'C' THEN 1 ELSE 0 END) AS ServiceC,
  SUM(CASE(S.ServiceType) WHEN 'D' THEN 1 ELSE 0 END) AS ServiceD
 FROM Clients AS C
 INNER JOIN Services AS s ON c.ClientId = s.ClientId
 GROUP BY C.ClientID
) A
WHERE ((A.ServiceB > 0) AND (A.ServiceA = 0)) 
 OR ((A.ServiceC > 0) AND (A.ServiceA = 0) AND (A.ServiceB = 0))
 OR ((A.ServiceD > 0) AND (A.ServiceA = 0) AND (A.ServiceB = 0) AND (A.ServiceC = 0))

In the inner query we join the tables. We throw away the function since we dont need it. Instead, we calculate the number of different services for each client. Next over the inner query results we implement your conditions. We simply check for the occurence of given services in a particual set.

The result is like this:

ClientID ServiceA ServiceB ServiceC ServiceD
-------- -------- -------- -------- --------
26915       0        4        2        2
26917       0        0        1        1
26921       0        3        2        3
26927       0        4        2        4

Of course you can strip the final result from Service columns. I've included them because I like it that way ;-) And it allows to check if the query works properly. You can even write a query which will not calculate the number of given service type for a given client. It will work even faster and give you the proper results.

Also if you really need your function, why not to change its implementation in a way that the function will return and ID after first succesfull join? It will save you a lot of time.

But only you know the bigger picture so all I wrote here might be rubbish ;-)

Anyway, I hope I helped you in some way.

Wodzu
Wow, very cool. It makes sense, essentially tabulating occurrences in a single subquery, then selecting from that. I only restated my criteria in a simpler yet equivalent expression - the only effect on your solution is a simpler outer WHERE clause `WHERE A.ServiceA = 0 AND (A.ServiceB > 0 OR A.ServiceC > 0 OR A.ServiceD > 0)`
johnny g