views:

711

answers:

12

A stored procedure that runs a SELECT on a large table is timing out. The where clause is causing the timeout, because I am selecting only cities that are in another table.

AND city IN (SELECT DISTINCT city from tOH where clientId = @clientId)
AND state IN (SELECT DISTINCT state from tOH where clientId = @clientId)

*note almost always only one state will be returned

I am trying to put the cities into a table and then use the table to populate the cities, but I am getting an error that @cities is not declared.

DECLARE @cities TABLE
(
city varchar(200)
);
INSERT INTO @cities (city) SELECT city FROM tOH WHERE clientId = @clientId GROUP BY city

Then my where clause changes to

AND city IN (SELECT city from @cities)

Can anyone figure out a good way of optimizing this stored procedure?

---------------------------- UPDATE ------------------------------------

The joins are all too slow. I think a solution with a temp table or table variable will work.

+4  A: 

I would try a JOIN on the tOH table and filter the entire query by clientid. You could also use SELECT INTO to throw it in a temp table.

orthod0ks
+1  A: 

You may want to look into placing an index on the State column, but you should do some benchmarks on this. You'll have to weigh the benefit of the index vs the cost when inserting new rows.

You may also want to do the same to the City column.

Nathan Koop
+1  A: 

Use EXISTS instead of IN

AND EXISTS(SELECT 1 FROM tOH WHERE tOH.city=main.city AND clientId=@clientId)

you'll also want to make sure that city is indexed in both tables.

Al W
Nice suggestion. Still too slow using exists.
Bryan
+4  A: 

From SQL Hacks:

When a subquery contains no aggregation functions, chances are you don't need a subquery - you need a JOIN.

So, you should convert your first subquery (AND CITY IN) to a JOIN. Unless you give us the rest of the query, we won't be able to show you exactly how, but the basis of it will be adding City as a table you are selecting from in the main query.

Winston Smith
I have a pretty complicated procedure already so I was attempting to stay away from another join.
Bryan
A: 
select ....
from ....
    inner join tOH ON ...city=tOH.city and ...state=tOH.state
where ... and tOH.clientId = @clientId
KM
+1  A: 

You've probably tried this, but my first reaction would be to use populate a temp table with your cities. That may be what you're doing and I'm just not familiar with the syntax, but I've always used:

Create Table #Cities(City varchar(200))

Then you'll fill the temp table and query from it as in your example (INSERT INTO... and AND city IN (SELECT city from #Cities))

AllenG
RolandTumble
Roland, table variables are hardly new, they are in SQL Server 2000. But there are times when temp tables perform better so you should try both to see which is best.
HLGEM
Because my tables were so large (1.5 million entries), this solution was optimal. Thank you!
Bryan
+8  A: 

Not only is it slow, it's incorrect.

Say your city is "Evansville, WI", but your tOH table only has entries for "Evansville, IN" and "Milwaukee, WI". You currently check the city and state portions separately, so your existing query will find matches for both "Evansville" and "WI". It will allow that city, even though it really shouldn't.

Do this instead:

INNER JOIN 
  ( 
    SELECT DISTINCT City AS tOHCity, State AS tOHState 
    FROM tOH 
    WHERE ClientID= @ClientID 
  ) cs ON cs.tOHCity = city AND cs.tOHState = state

Note that the subquery is based on the assumption that the DISTINCT from your original post is necessary because you could have more than one of the same city in that table per client. If that's not the case, you can just join to the tOH table directly.

Combine this with proper indexing and you should be good.

Joel Coehoorn
Good point. In this example the chances of the same a city in two states is very rare. I wrote an was asterisk vaguely attempting to say that only one state would be returned most of the time. I should have said that the area covered by a client is smaller than the size of an area code, so crossing state lines is very rare.
Bryan
+4  A: 

I think it might be worthwhile to point out the reason behind the percieved timeout. Your query selects from the original table every single record, then for each record it selects it has to subquery the DISTINCT list of cities in the same table over and over again for each record.

C Bauer
Excellent point. This is why I virtually never use a subquery. I either use a plain join as can be done in this case or a derived table.
HLGEM
Exactly why I wanted to create a temp table to store cities - @cities
Bryan
Well, even with the @cities temp table, you end up querying the temp table one time for every row of your main table, and you'll end up seeing the same inefficiencies. Without the entire stored procedure being posted here, I'm not sure how much help all of us can be. Out of curiosity, you are using a primary key, right?
C Bauer
In fact, how many rows are in this table?
C Bauer
The table has 1.5 million rows
Bryan
A: 

In answer to your problem of why @cities as a table variable might not work, you haven't shown the rest of the sp but I'll be you somewhere are building dyunamic SQL and executing it. That woudl be out of scope for a previously existing table variable.

HLGEM
A: 

You can use your original query (without joins), making next corrections:

  1. Create index on TOH by clientID, city, state
  2. Remove DISTINCT keyword - this allows optimizer to make efficient use of that index
Arvo
No, he can't. His existing query is WRONG. See my answer for why.
Joel Coehoorn
A: 

Change your IN filter to exists. So instead of:

AND city IN (SELECT DISTINCT city from tOH where clientId = @clientId)
AND state IN (SELECT DISTINCT state from tOH where clientId = @clientId)

Change it to something like:

AND EXISTS ( 
      SELECT 0 FROM tOH t WHERE 
      ClientID = @clientId 
      and t.City = parent.ctiy 
      and t.state = parent.state 
)

I've done performance testing and have always found that EXISTS performs faster than IN. Furthermore, since you are doing it twice with the same table, you are taking a double hit.

You should also index city and state in tOH, if possible. Indexes are a give take so make sure you understand what the implications are if you add them.

Frank V
Nice idea. Still too slow in this case.
Bryan
+1  A: 

I'm assuming the reason for the DISTINCT on tOH is that a city name may exist in multiple states and likewise there are multiple occurrences for a state since each state has multiple cities.

If each city and state combination is a unique occurrence it would be more appropriate and cost effective to drop the DISTINCT and do something like the following:

select mytable.* 
from mytable m
inner join tOH t on  t.clientid = @clientId 
    and t.city = m.city and t.state = m.state
catalpa
I will try this right now.
Bryan
Still very slow in this case.
Bryan