views:

795

answers:

1

I find it cumbersome to create a trigger just to get the current total rows of the table without doing COUNT(*) FROM table. I'm thinking if their planned index-organized tables for Postgres 8.5 could make it possible?

+2  A: 

I wouldn't have thought an index-organised table would necessarily be faster to scan to count all the visible tuples. Logically, it would have to go through the same amount of data, whether it's organised so that data is in b-tree leaf nodes or in the existing heap format.

Currently, postgresql indices only store [key,ctid] pairs (essentially). (A ctid is essentially a "rowid" - heap page number and tuple line pointer index) So you can't count the rows in the table just by going through the index, because you need to check [xmin,xmax] for each tuple-- and that's only kept with the data, in the heap.

You could put [xmin,xmax] in the index as well-- suggestions for this crop up from time to time. But this bloats the indices, and to be useful all updates/deletes would have to make sure they were kept up-to-date: and that causes problems, not least because the work involved in doing an update has now expanded by some amount multiplied by the number of indices on the table. In the case of heavy indices such as on tsvector, or ones based on costly user expressions, this could take a while, and in some nasty cases not work at all, with rows now appearing to be live in the index, but dead in the heap. And the whole point of this exercise was supposed to be to allow the database to rely on the live-ness information in the index exclusively if possible. This cost would be incurred even if you were updating a non-indexed column-- something the team went to some effort to speed up in 8.3 (heap-only tuples).

I suppose one possibility would be to mark indices as optionally having [xmin,xmax] - e.g. only mark the pkey index that way. Then there would have to be planner changes to figure out when this was an advantage--- it seems like quite a bit of work.

Index-organised tables, if they work as I believe they do in Oracle (and SQL Server, where any table with a clustered index is basically index-organised) work by storing [key,tuple] in the primary key index instead (and presumably [key,pkey] in all the others) - no ctid, no heap. So "tuple" will contain [xmin,xmax,cminmax,natts,....] etc and you could satisfy the "select count(*) from table" just by scanning the index. But this essentially just the same as scanning the tuples on the heap--- they don't magically take up less space because they're now in an "index".

AFAICT the main reason for an index-organised table is that a small table with a single primary-key index will take up 1 page instead of 3, and index scans by primary key may be a bit faster. I do seem to remember the Oracle-related advice I was given for IOTs was that they were intended for static dimension tables, and not general purpose use, partly due to the cost imposed on maintaining secondary indices (I don't think Oracle stores [key,pkey] in IOT secondary indices, but rather some sort of alternative rowid).

araqnid