views:

27

answers:

1

Hi All,

I have a table I'm working on that has around 3 million tuples. It doesn't change too often (a few updates or inserts per week) and is read a lot. (Please don't comment on the varchar of length 1. I know, I know).

   Column    |         Type          |                      Modifiers                       
-------------+-----------------------+------------------------------------------------------
 id          | integer               | not null default nextval('mytable_id_seq'::regclass)
 A           | character varying(5)  | not null
 B           | character varying(16) | not null
 C           | character varying(3)  | not null
 D           | character varying(1)  | not null
 otherdata   | character varying(99) | not null
Indexes:
    "mytable_pkey" PRIMARY KEY, btree (id)
    "mytable_unique_key" UNIQUE, btree (A, B, C, D)
    "mytable_B_idx" btree (B)
Foreign-key constraints:
    "$1" FOREIGN KEY (A, B) REFERENCES anothertable1(A, B)
    "$2" FOREIGN KEY (C) REFERENCES anothertable2(C)
    "$3" FOREIGN KEY (D) REFERENCES anothertable3(D)
Referenced by:
    TABLE "anothertable4" CONSTRAINT "$1" FOREIGN KEY (id) REFERENCES mytable(id)
    TABLE "anothertable5" CONSTRAINT "fkey_id" FOREIGN KEY (id) REFERENCES mytable(id) ON UPDATE CASCADE ON DELETE CASCADE

id is my primary key. A,B,C,D is a candidate key. Both obviously uniquely identify a tuple.

The most frequent queries are:

SELECT * FROM mytable WHERE B='foo'; - Will return a number of tuples

SELECT * FROM mytable WHERE A='foo' AND B='bar' AND C='baz' AND D='f'; - Will return one tuple.

Hence why there are indexes on B and A,B,C,D.

Now, for whatever reason, I'm doing the following query (And more similar):

SELECT * FROM mytable WHERE ((A='foo' AND B='bar') OR (B='foo' AND C='bar'));

One box is running PostgreSQL 8.4.4. If I EXPLAIN ANALYZE the first query, I get the following query plan:

                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on mytable  (cost=9.74..174.30 rows=1 width=14) (actual time=0.000..0.000 rows=5 loops=1)
   Recheck Cond: ((((A)::text = 'foo'::text) AND ((B)::text = 'bar'::text)) OR ((B)::text = 'foo'::text))
   Filter: ((((A)::text = 'foo'::text) AND ((B)::text = 'bar'::text)) OR (((B)::text = 'foo'::text) AND ((C)::text = 'bar'::text)))
   ->  BitmapOr  (cost=9.74..9.74 rows=42 width=0) (actual time=0.000..0.000 rows=0 loops=1)
         ->  Bitmap Index Scan on mytable_unique_key(cost=0.00..4.80 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1)
               Index Cond: (((A)::text = 'foo'::text) AND ((B)::text = 'bar'::text))
         ->  Bitmap Index Scan on mytable_B_idx(cost=0.00..4.94 rows=42 width=0) (actual time=0.000..0.000 rows=316 loops=1)
               Index Cond: ((B)::text = 'foo'::text)
 Total runtime: 0.000 ms
(9 rows)

A minimum cost of 9.74 and an almost instant return (Yeh, it is cached). Now, if I run the same query on PostgreSQL 8.1.5 on another similar machine - with the exact same contents in the table - I get the following:

                                                                         QUERY PLAN                                                                      
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on mytable (cost=110156.34..110168.36 rows=3 width=26) (actual time=147200.984..147221.480 rows=5 loops=1)
   Recheck Cond: ((((A)::text = 'foo'::text) AND ((B)::text = 'bar'::text)) OR (((B)::text = 'foo'::text) AND ((C)::text = 'bar'::text)))
   ->  BitmapOr  (cost=110156.34..110156.34 rows=3 width=0) (actual time=147185.513..147185.513 rows=0 loops=1)
         ->  Bitmap Index Scan on mytable_unique_key(cost=0.00..2.01 rows=1 width=0) (actual time=83.275..83.275 rows=0 loops=1)
               Index Cond: (((A)::text = 'foo'::text) AND ((B)::text = 'bar'::text))
         ->  Bitmap Index Scan on mytable_unique_key(cost=0.00..110154.34 rows=2 width=0) (actual time=147102.230..147102.230 rows=5 loops=1)
               Index Cond: (((B)::text = 'foo'::text) AND ((C)::text = 'bar'::text))
 Total runtime: 147221.663 ms
(8 rows)

Both tables were VACUUM'ed and both boxes. So, the incredible difference is due to the differing versions and performance boosts introduced between 8.1.5 and 8.4.4. Big up to the devs!

Ok, the point of this question is not to benchmark different versions of PostgreSQL, but to ask this: How can I improve the performance of the above query? I have the following solutions (or questions):

  1. Upgrade to the latest stable PostgreSQL. We have 8.1.5 in production on many servers. Con: The upgrade task will be a long one. I don't mind too much, since it'll be ops doing it. The data will require a full dump and import. Pro: We benefit from insane performance improvement and additional features that come with the latest version.
  2. Optimize the query to help the planner along. I don't see how I can do this for the above query.
  3. Add indexes. This will help the planner and speed up execution. However it adds a bit of overhead. And what indexes will I need to add? A,B and B,C or A, B and C? The former will help with the above query. But, I have other similar queries that filter on other columns. The queries will be done on these column sets: B, B,C, A,B, A,B,C, B,C,D and A,B,C,D. Does that mean I need an index for each column set? Or just the most expensive? In the above query, scanning for B,C was the most expensive.

Thank you in advance.

+1  A: 

It looks like the mytable_unique_key-index is bloated on your 8.1-box. Try to fix this first:

REINDEX TABLE tablename;

After the reindexing, could you do a new EXPLAIN?

You should start a migration to a newer version as well, support for 8.1 will end this year.

Frank Heikens
Roger, Roger. I don't think that reindexing will help things, as both DB's were imported recently from a production backup, and neither have had their contents altered since the import. Will report back once it's done.
Nicolas
New cost=42553.34..42565.36 without the new indexes. So, an improvement, but still a heavy load. I'll accept and give ops some work. Thank you.
Nicolas
Don't forget the 8.4-planner is much smarter than the 8.1-planner. That might be the reason your 8.1-box picks the wrong index. On the 8.1-box, you might be better of using an index on B and C, not only B.
Frank Heikens