tags:

views:

49

answers:

2

I'm querying to return all rows from a table except those that are in some list of values that is constant at query time. E.g. SELECT * FROM table WHERE id IN (%), and % is guaranteed to be a list of values, not be a subquery. However, this list of values may be up to 1000 elements long in some cases. Should I limit this to a smaller sublist (as few as 50-100 elements is as low as I can go, in this case) or will there be a negligible performance gain?

+2  A: 

Use a temporary table to JOIN, gives better performance and has no limits. An IN() having a 1000 arguments, will give you problems in any database.

Frank Heikens
+3  A: 

I assume it's a large table, otherwise it wouldn't matter much.

Depending on table size and number of keys, this may turn into a sequence scan. If there are many IN keys, Postgres often chooses not to use an index for it. The more keys, the bigger the chance of a sequence scan.

If you use another indexed column in WHERE, like:

select * from table where id in (%) and my_date > '2010-01-01';

It's likely to fetch all rows matching the indexed (my_date) columns, and then perform an in-memory scan on them.

Using a JOIN to a persistent or temporary table may, but does not have to help. It still will need to locate all the rows, either with a nested loop (unlikely for large data), or for a hash/merge join.

I would say the solution is:

  • Use as few IN keys as possible.
  • Use other criteria for indexing and querying whenever possible. If IN requires an in-memory scan of all rows, at least there will be fewer of them thanks to additional criteria.
Konrad Garus