views:

161

answers:

2

I asked PostgreSQL to explain my query. Part of the explanation was:

table_name --> Materialize

What does materialize do? I'm joining two tables, not views or anything like that.

+2  A: 

It means that it can't use any index (or similar method) to make the join efficient, so as a last resort is materializes the result from one of the tables to have a smaller set to work with when joining against the other table.

Guffa
That's only partially correct. In many (most) cases it will *not* put a materialize node in just because it can't use an index - it will just work off a sequential scan on each table. Materializing the output of a seqscan doesn't make a huge difference - I find it more common to show up higher up in the tree than just above a scan node.
Magnus Hagander
@Magnus: Yes, if there is any other method to join the tables, it will use it, but in this case there obviously isn't, and it will put in the materialize as the last option.
Guffa
No, it won't do that as the last option. The last option is to just do the sequential scan and not materialize at all. It does it because it can drive a smarter plan that way, and doesn't have to fall back to the last-resort.
Magnus Hagander
@Magnus: The materialize is always inserted as the last option: "If the inner input set of a Merge Join operator is not produced by a Seq Scan, an Index Scan, a Sort, or a Materialize operator, the planner/optimizer will insert a Materialize operator into the plan." http://www.iphelp.ru/faq/15/ch04lev1sec3.html
Guffa
@Guffa: That's only true if the join is a mergejoin. Which is not likely to be chosen as a last-resort plan. It can happen as last-resort, but not with a seq-scan (as we have here) - which the text you're quoting actually tells you. BTW, The version of the book you're referring to is for ancient versions of PostgreSQL - I'd check up at least twice on most of the things you find in there.
Magnus Hagander
@Guffa: My understanding is that a Materialise node is inserted only if (a) there is a cheap subplan that would need to scan its input rows more than once (such as the inner side of a plain ol' nested loop join) and (b) it would be expensive to regenerate those rows (such as if the input comes from a sequential scan of a large table but is expected to produce a small number of rows).
j_random_hacker
+4  A: 

A materialize node means the output of whatever is below it in the tree (which can be a scan, or a full set of joins or something like that) is materalized into memory before the upper node is executed. This is usually done when the outer node needs a source that it can re-scan for some reason or other.

So in your case, the planner is determining that the result of a scan on one of your tables will fit in memory, and it till make it possible to choose an upper join operation that requires rescans while still being cheaper.

Magnus Hagander
I like this answer better - there's more reason to materializing than just not using an index.
rfusca
An easy way to explain it is that it stores the intermediate result.
Grant Johnson