ansaurus

Question

Answer 1

+4 A:

I would try a JOIN on the tOH table and filter the entire query by clientid. You could also use SELECT INTO to throw it in a temp table.

orthod0ks 2009-05-06 19:51:38

Answer 2

+1 A:

You may want to look into placing an index on the State column, but you should do some benchmarks on this. You'll have to weigh the benefit of the index vs the cost when inserting new rows.

You may also want to do the same to the City column.

Nathan Koop 2009-05-06 19:52:46

Answer 3

+1 A:

Use EXISTS instead of IN

AND EXISTS(SELECT 1 FROM tOH WHERE tOH.city=main.city AND clientId=@clientId)

you'll also want to make sure that city is indexed in both tables.

Al W 2009-05-06 19:53:29

Nice suggestion. Still too slow using exists.

Bryan 2009-05-06 21:35:36

Answer 4

+4 A:

From SQL Hacks:

When a subquery contains no aggregation functions, chances are you don't need a subquery - you need a JOIN.

So, you should convert your first subquery (AND CITY IN) to a JOIN. Unless you give us the rest of the query, we won't be able to show you exactly how, but the basis of it will be adding City as a table you are selecting from in the main query.

Winston Smith 2009-05-06 19:54:19

I have a pretty complicated procedure already so I was attempting to stay away from another join.

Bryan 2009-05-06 21:36:13

Answer 5

A:

select ....
from ....
    inner join tOH ON ...city=tOH.city and ...state=tOH.state
where ... and tOH.clientId = @clientId

KM 2009-05-06 19:56:29

Answer 6

+1 A:

You've probably tried this, but my first reaction would be to use populate a temp table with your cities. That may be what you're doing and I'm just not familiar with the syntax, but I've always used:

Create Table #Cities(City varchar(200))

Then you'll fill the temp table and query from it as in your example (INSERT INTO... and AND city IN (SELECT city from #Cities))

AllenG 2009-05-06 19:57:22

RolandTumble 2009-05-06 20:19:45

Roland, table variables are hardly new, they are in SQL Server 2000. But there are times when temp tables perform better so you should try both to see which is best.

HLGEM 2009-05-06 20:29:27

Because my tables were so large (1.5 million entries), this solution was optimal. Thank you!

Bryan 2009-05-07 17:02:24

Answer 7

+8 A:

Not only is it slow, it's incorrect.

Say your city is "Evansville, WI", but your tOH table only has entries for "Evansville, IN" and "Milwaukee, WI". You currently check the city and state portions separately, so your existing query will find matches for both "Evansville" and "WI". It will allow that city, even though it really shouldn't.

Do this instead:

INNER JOIN 
  ( 
    SELECT DISTINCT City AS tOHCity, State AS tOHState 
    FROM tOH 
    WHERE ClientID= @ClientID 
  ) cs ON cs.tOHCity = city AND cs.tOHState = state

Note that the subquery is based on the assumption that the DISTINCT from your original post is necessary because you could have more than one of the same city in that table per client. If that's not the case, you can just join to the tOH table directly.

Combine this with proper indexing and you should be good.

Joel Coehoorn 2009-05-06 19:57:40

Good point. In this example the chances of the same a city in two states is very rare. I wrote an was asterisk vaguely attempting to say that only one state would be returned most of the time. I should have said that the area covered by a client is smaller than the size of an area code, so crossing state lines is very rare.

Bryan 2009-05-06 21:56:04

Answer 8

+4 A:

I think it might be worthwhile to point out the reason behind the percieved timeout. Your query selects from the original table every single record, then for each record it selects it has to subquery the DISTINCT list of cities in the same table over and over again for each record.

C Bauer 2009-05-06 20:14:13

Excellent point. This is why I virtually never use a subquery. I either use a plain join as can be done in this case or a derived table.

HLGEM 2009-05-06 20:30:31

Exactly why I wanted to create a temp table to store cities - @cities

Bryan 2009-05-06 22:34:45

Well, even with the @cities temp table, you end up querying the temp table one time for every row of your main table, and you'll end up seeing the same inefficiencies. Without the entire stored procedure being posted here, I'm not sure how much help all of us can be. Out of curiosity, you are using a primary key, right?

C Bauer 2009-05-07 12:57:50

In fact, how many rows are in this table?

C Bauer 2009-05-07 13:00:06

The table has 1.5 million rows

Bryan 2009-05-07 16:23:43

Answer 9

A:

In answer to your problem of why @cities as a table variable might not work, you haven't shown the rest of the sp but I'll be you somewhere are building dyunamic SQL and executing it. That woudl be out of scope for a previously existing table variable.

HLGEM 2009-05-06 20:33:22

Answer 10

A:

You can use your original query (without joins), making next corrections:

Create index on TOH by clientID, city, state
Remove DISTINCT keyword - this allows optimizer to make efficient use of that index

Arvo 2009-05-06 20:34:28

No, he can't. His existing query is WRONG. See my answer for why.

Joel Coehoorn 2009-05-06 20:46:22

Answer 11

A:

Change your IN filter to exists. So instead of:

AND city IN (SELECT DISTINCT city from tOH where clientId = @clientId)
AND state IN (SELECT DISTINCT state from tOH where clientId = @clientId)

Change it to something like:

AND EXISTS ( 
      SELECT 0 FROM tOH t WHERE 
      ClientID = @clientId 
      and t.City = parent.ctiy 
      and t.state = parent.state 
)

I've done performance testing and have always found that EXISTS performs faster than IN. Furthermore, since you are doing it twice with the same table, you are taking a double hit.

You should also index city and state in tOH, if possible. Indexes are a give take so make sure you understand what the implications are if you add them.

Frank V 2009-05-06 20:35:39

Nice idea. Still too slow in this case.

Bryan 2009-05-06 22:28:51

Answer 12

+1 A:

I'm assuming the reason for the DISTINCT on tOH is that a city name may exist in multiple states and likewise there are multiple occurrences for a state since each state has multiple cities.

If each city and state combination is a unique occurrence it would be more appropriate and cost effective to drop the DISTINCT and do something like the following:

select mytable.* 
from mytable m
inner join tOH t on  t.clientid = @clientId 
    and t.city = m.city and t.state = m.state

catalpa 2009-05-06 20:47:53

I will try this right now.

Bryan 2009-05-06 21:47:13

Still very slow in this case.

Bryan 2009-05-06 22:33:01

ansaurus

tags:

views:

answers:

Optimize SQL Stored Procedure

related questions