I'm querying to return all rows from a table except those that are in some list of values that is constant at query time. E.g. SELECT * FROM table WHERE id IN (%), and % is guaranteed to be a list of values, not be a subquery. However, this list of values may be up to 1000 elements long in some cases. Should I limit this to a smaller sublist (as few as 50-100 elements is as low as I can go, in this case) or will there be a negligible performance gain?
Use a temporary table to JOIN, gives better performance and has no limits. An IN() having a 1000 arguments, will give you problems in any database.
I assume it's a large table, otherwise it wouldn't matter much.
Depending on table size and number of keys, this may turn into a sequence scan. If there are many IN
keys, Postgres often chooses not to use an index for it. The more keys, the bigger the chance of a sequence scan.
If you use another indexed column in WHERE
, like:
select * from table where id in (%) and my_date > '2010-01-01';
It's likely to fetch all rows matching the indexed (my_date
) columns, and then perform an in-memory scan on them.
Using a JOIN
to a persistent or temporary table may, but does not have to help. It still will need to locate all the rows, either with a nested loop (unlikely for large data), or for a hash/merge join.
I would say the solution is:
- Use as few
IN
keys as possible. - Use other criteria for indexing and querying whenever possible. If
IN
requires an in-memory scan of all rows, at least there will be fewer of them thanks to additional criteria.