views:

226

answers:

7

I have a query that includes this:

... AND Record.RecordID IN (1,2,3,10,11,12,13,16,17,18,26,27,28,557,31,32,33,36,37,93) AND ...

The problem seems to be that if there are 20 items or more in that list, the query takes over 25 seconds to execute. If there are less than 20, it executes immediately. Any ideas on how to optimize?

A: 

It seems dirty and unnecessary, but have you tried:

(Record.RecordID IN (--19 items--) OR Record.RecordID = 20th_item) AND

I don't know why adding the 20th item to the IN group would push it over the edge.

Sarah Vessels
`col1 IN (a,b,c)` is literally the same as `col1 = a or col1 = b or col1 = c`, so I doubt this will help
Andomar
+9  A: 

One thing to do would be to look at the optimizer plan (if you can) and see how the plan differs when you use 20 items or less vs. > 20. In Oracle, for example, you can use an explain plan command to see this output.

Here's some info on how to use explain plan in Oracle: http://download.oracle.com/docs/cd/B10501%5F01/server.920/a96533/ex%5Fplan.htm

Other things to consider are whether or not you have an index on RecordID. It may be that once you cross a certain threshold (> 20 items in your case) the optimizer decides it's better to use a full table scan vs. using your index.

Sometimes with some databases you can use optimizer hints to persuade the optimizer to use an index if that indeed results in better performance.

Here's a link to optimizer hints you can read: http://download.oracle.com/docs/cd/B19306%5F01/server.102/b14211/hintsref.htm

My answer is Oracle-centric, but the same principles should apply to most any database.

dcp
Same applies to MSSQL: a look at the execution plan will explain.
Alex
+9  A: 

Place the RecordID's in a temporary table, and use an inner join to filter on them. For SQL Server, this looks like:

declare @RecordIds table (int RecordID)
insert into @RecordIds values (1)
insert into @RecordIds values (2)
...
insert into @RecordIds values (93)

select r.*
from Records r
inner join @RecordIds ri on ri.RecordID = r.RecordID
Andomar
+1: Look at pipeline functions for Oracle: http://www.akadia.com/services/ora_pipe_functions.html
OMG Ponies
Good advice, but due to some restrictions, I would really prefer to accomplish this without temporary tables...
Raggedtoad
@Raggedtoad: Then consider a CLR table valued function: http://msdn.microsoft.com/en-us/library/ms131103.aspx
OMG Ponies
A: 

for MySQL the manual says "The number of values in the IN list is only limited by the max_allowed_packet value. " It does seem unlikely that this is the issue, but it's a place to look.

In any event, storing your IN() values in a temp table and joining your query to it should get round the whole problem.

dnagirl
+2  A: 

The 20th item happen to tip the balance of this particular query cost estimation from one plan to the other. with 20 items you are probably getting a full table scan. IN is just syntactic sugar for OR .. OR ... OR ... OR. And OR is the enemy of good query plans. Use a join, as Andomar suggested.

Update

If you get out from the IN syntax, you can also use a query plan hint and make sure the query stays on the optimal plan. IN syntax forces you to change the query with each execution, so you cannot use a query plan hint.

Remus Rusanu
A: 

A CLR table valued function would be another way to create a table based on the provided parameters - for more info, see SQL Server 2005: CLR Table-Valued Functions

OMG Ponies
A: 

It seems that when you add the 20th item, the optimizer generates a different execution plan. The execution plans are built based on statistics. The search criteria affects the expected result rows. As you add more items in the criteria list, the expected returned rows change and the optimizer might generate a new execution plan.

Check the execution plan (CTRL-L) of both queries. It's the only way to learn why it takes more time when you have more than 20 items in the list.

Before examining the execution plans, update the statistics of your table:

UPDATE STATISTICS records

if you can wait

UPDATE STATISTICS records WITH FULLSCAN

The second one will take more time but you are going to have more consistent statistics.

yioann