views:

92

answers:

2

I'm starting to get a much better grasp on PostgreSQL indexing, but I've run into an issue with the OR conditional, where I don't know how to go about optimizing my indexes for a faster query.

I have 6 conditionals that, when run individually, appear to have a small cost. Here's an example of the trimmed queries, including query plan calculated times.

(NOTE: I haven't output the actual query plans for these queries below for the sake of reducing complexity, but they all use nested loop left joins and index scans as I would expect with proper indexing. If necessary, I can include the query plans for a more meaningful response.)

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.25..46.69 rows=1 width=171) (actual time=0.031..0.031 rows=0 loops=1)

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions2)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.76..18.97 rows=1 width=171) (actual time=14.764..14.764 rows=0 loops=1)

/* snip */

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions6)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=0.51..24.48 rows=1 width=171) (actual time=0.252..5.332 rows=10 loops=1)

My problem is that I want to join these 6 conditions together with OR operators, making each condition a possibility. My combined query appears more like this:

EXPLAIN ANALYZE SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1 OR conditions2 OR conditions3 OR conditions4 OR conditions5 OR conditions 6)
 LIMIT 10;

Unfortunately, this results in a MASSIVE increase on the query plan, which no longer seems to be using my indexes (instead, choosing to do a hash left join rather than a nested loop left join, and performing various sequence scans over the previously used index scans).

Limit  (cost=142.62..510755.78 rows=1 width=171) (actual time=30.591..30.986 rows=10 loops=1)

Is there anything special I should know about indexing with regards to OR-ed conditions that would improve my final query?

UPDATE: If I use a UNION for each individual SELECT, that seems to speed up the query. However, will that prevent me from ordering my results if I choose to in the future? Here's what I did to speed up the query via UNION:

EXPLAIN ANALYZE
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions1)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions2)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions3)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions4)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions5)
UNION
SELECT t1.*, t2.*, t3.*
  FROM t1 LEFT JOIN t2 on t2.id = t1.t2_id LEFT JOIN t3 ON t3.id = t1.t3_id
 WHERE (conditions6)
 LIMIT 10;

QUERY PLAN
-------------------------------------------------------------------------------------
Limit  (cost=219.14..219.49 rows=6 width=171) (actual time=125.579..125.653 rows=10 loops=1)
+3  A: 

Depending on the conditions, it may be logically impossible to use any index to help a complex condition using OR expressions.

Like MySQL, PostgreSQL 8.0 and earlier states in their docs on indexes:

Note that a query or data manipulation command can use at most one index per table.

With PostgreSQL 8.1, this has changed.

However, if this doesn't help, you can use the UNION solution you tried (this is a common solution for MySQL users, which continues to have a one-index-per-table limitation).

You should be able to order the results of a UNION query, but you have to use parentheses to specify that that ORDER BY applies to the result of the UNION, not merely to the last subquery in the chain.

(SELECT ... )
UNION
(SELECT ... )
UNION
(SELECT ... )
ORDER BY columnname;


I hope this helps; I'm not an expert on the PostgreSQL optimizer. You might try searching the mailing list archives, or asking on the IRC channel.

Bill Karwin
I presume the 8.1 change you point out is still determined by the query planner, and thus if it doesn't appear to be used, then I'm essentially not going to be able to use it for my Or conditions? It looks like I'll need to use UNION's, but it's good to know how to limit the results by use of parentheses. Thanks!
Matt Huggins
Also, I didn't realize that only a single index (prior to 8.1) was used in a query plan. That helps me to better understand how I should go about creating my index, so thanks for pointing that out as well.
Matt Huggins
+1  A: 

(Sorry - don't know how to reply to a reply, so this is going top level)

To clarify - PG used to only use a single index for a single table-scan. If you have a query joining three tables and each has a useful index it was always smart enough to use all three.

In your particular case what is probably happening is that you have some connection between your ORed conditions. PostgreSQL doesn't know this, and so ends up assuming it will match more rows than it actually does. Enough rows to change your query-plan.

Also your UNIONed queries aren't quite the same as the individual ones since you LIMIT each small one separately rather than the whole result-set with the UNION.

You should be able to order the results of a UNION query, but you have to use parentheses to specify that that ORDER BY applies to the result of the UNION, not merely to the last subquery in the chain.

This isn't right - the ORDER BY applies to the whole result.

HTH

Richard Huxton
Thanks for the additional help on this. I definitely know a lot more about PostgreSQL and indexing than I did before opening this question. :)
Matt Huggins
Also, to answer your question, you need to gain more rep before you can post comments. Just answer peoples' questions like this, and you'll gain rep. I think you just need 15 rep for comments, not a big hurdle at all.
Matt Huggins