views:

51

answers:

5

I have a query of the form:

select m.id from mytable m
left outer join othertable o on o.m_id = m.id
    and o.col1 is not null and o.col2 is not null and o.col3 is not null
where o.id is null

The query returns a few hundred records, although the tables have millions of rows, and it takes forever to run (around an hour).

When I check my index statistics using:

select * from pg_stat_all_indexes
where schemaname <> 'pg_catalog' and (indexrelname like 'othertable_%' or indexrelname like 'mytable_%')

I see that only the index for othertable.m_id is being used, and that the indexes for col1..3 are not being used at all. Why is this?

I've read in a few places that PG has traditionally not been able to index NULL values. However, I've read this has supposedly changed since PG 8.3? I'm currently using PostgreSQL 8.4 on Ubuntu 10.04. Do I need to make a "partial" or "functional" index specifically to speed up IS NOT NULL queries, or is it already indexing NULLs and I'm just misunderstanding the problem?

+1  A: 

A single index on m_id, col1, col2 and o.col3 would be my first thought for this query.

And use EXPLAIN on this query to see how it is executed and what takes so much time. You could show us the results to help you out.

Frank Heikens
+1  A: 

You could try a partial index:

CREATE INDEX idx_partial (m_id) ON othertable
WHERE (col1 is not null and col2 is not null and col3 is not null);

From the docs: http://www.postgresql.org/docs/current/interactive/indexes-partial.html

Matthew Wood
This worked perfectly for me. A test query that took several minutes to run now runs in a couple seconds with this index.
Chris S
+1  A: 

A partial index seems the right way here:

If you have a table that contains both billed and unbilled orders, where the unbilled orders take up a small fraction of the total table and yet those are the most-accessed rows, you can improve performance by creating an index on just the unbilled rows.

Perhaps those nullable columns (col1,col2,col3) act in your scenario as some kind of flag to distinguish some subclass of records in your table? (for example, some sort of "logical deletion") ? In that case, besides the partial index solution, you might prefer to rethink your design, and put them in different physical tables (perhaps using inheritance), one for the "live records" other for the "historical records" and access the full set (only when needed) thrugh a view.

leonbloy
+1  A: 

Partial indexes aren't going to help you here as they'll only find the records you don't want. You want to create an index that contains the records you do want.

CREATE INDEX findDaNulls ON othertable ((COALESCE(col1,col2,col3,'Empty')))
WHERE col1 IS NULL AND col2 IS NULL AND col3 IS NULL;

SELECT * 
FROM mytable m
JOIN othertable o ON m.id = o.m_id
WHERE COALESCE(col1,col2,col3,'Empty') = 'Empty';

BTW searching for null left joins generally isn't as fast as using EXISTS or NOT EXISTS in Postgres.

Scott Bailey
This is an interesting idea, but doesn't exactly match my logic. Your solution finds all othertables records where *all* columns are NULL. In my example, I'm interested in all records where *at least* one column is NULL.
Chris S
A: 

Did you try to create a combined index on othertable(m_id, col1, col2, col3)?

You should also check the execution plan (using EXPLAIN) rather than checking the system tables for the index usage.

PostgreSQL 9.0 (currently in beta) will be able to use and index for a IS NULL condition. That feature got postponed

a_horse_with_no_name