ansaurus

Question

Why is Postgres doing a Hash in this query?

Answer 1

+1 A:

Without seeing an explain analyze, these kind of problems usually result from statistics being off or an unusual setting required for random_page_cost or seq_page_cost.

It may run better with

set enable_hashjoin = false;

rfusca 2010-06-17 20:54:55

Answer 2

+1 A:

The query planner estimated it'd be faster to sequentially read all the data and hash it, than to perform an estimated 2100 index scans with their associated much more random disk access.

Stephen Denne 2010-06-21 01:22:15

Answer 3

A:

Your problem is that the optimizer doesn't have the right statistics to determine how many matches "A.H_id = tmp_ids.id" is going to create, which is a common problem with temporary tables--they don't have statistics the way a regular one does. It guesses that 21 rows are going to match coming out of the "Index Scan using idx_A_handid on A", but there are actually only 3. It's highlighted in the explain analysis where the lowest level up arrow has a 7 next to it, giving the multiplier for how wrong the estimate was.

That error carries forward to where it thinks it has 2100 rows to scan, at which point it might as well do a full sequential scan and hash the results given that's likely to touch most blocks in the table.

Had it known correctly there were only 300 to probe, it might have done something different involving only a subset of the data. You can't expect to get good plans from joins against temporary tables because of their lack of statistics. This may be a case where it's appropriate to nudge correct behavior by turning off enable_hashjoin before executing the query.

Greg Smith 2010-06-22 13:04:52

ansaurus

tags:

views:

answers:

Why is Postgres doing a Hash in this query?

related questions