ansaurus

Question

alternative to bitmap index in postgresql

Answer 1

A:

Your real problem is a bad schema design, not the index. The properties should be placed in a different table and your current table should link to that table using a many to many relation.

The BIT datatype might also be of use, just check the manual.

Frank Heikens 2010-09-29 18:49:13

All properties are orthogonal to each other. So if i normalize N properties, then there would be N tables. What you saying is i should group N properties into m groups, and make m tables which is permutation of group members, then link the AA table to this m tables to increase cardinality of each field?

tk 2010-09-29 21:02:01

No, I'm saying you need a table "properties", "aa" and "properties_aa". The last table only holds the relations between properties and your aa table. This table will be huge, but can be indexed. Booleans are almost imposible to index, you only have 3 options: NULL, FALSE and TRUE. id's in the table properties_aa are much better candidates.

Frank Heikens 2010-09-30 05:57:27

Answer 2

A:

An index is only used if it actually speeds up the query which is not necessarily always the case. Especially with smallish tables (say thousands of rows) a full table scan ("seq scan" in the Postgres execution plan) might indeed be a lot faster.

How many rows did the table have when you tried the statement? How did the query look like? Maybe there are other conditions that prevent the index usage. Did you analyze the table to have up-to-date statistics?

a_horse_with_no_name 2010-09-29 18:56:58

As i mentioned above, the number of rows is hundreds of millions. I usually do 'select id from AA where prop0 = true and prop1 = false and prop2 = 3 and ...'. I analyzed the table to update statistics.

tk 2010-09-29 20:37:13

Can you post the execution plan? Ideally the output of EXPLAIN ANALYZE? How are the true/false values distributed? A seq scan is usually faster than an index lookup if more than ~20% of all rows are selected. So if prop0 = true yield have of all rows, there is no benefit in using the index. If you are always using all columns a composite index on the them will probably make more sense than one index for each property.

a_horse_with_no_name 2010-09-29 20:38:31

Answer 3

A:

Create a multicolumn index on properties which are always or almost always queried. Or several multicolumn indexes if needed.

The alternative, when you do not query the same properties almost always, is to make a tsvector column with words describing your data, maintained using trigger, for example

prop0=true
prop1=false
prop2=4

would be

'propzero nopropone proptwo4'::tsvector

index it using GIN and then use full text search for searching:

where tsv @@ 'popzero & nopropone & proptwo4'::tsquery

Tometzky 2010-09-30 16:12:13

ansaurus

tags:

views:

answers:

alternative to bitmap index in postgresql

related questions