views:

1423

answers:

4

I have been working on optimizing my Postgres databases recently, and traditionally, I've only ever use B-Tree indexes. However, I saw that GiST indexes suport non-unique, multicolumn indexes, in the Postgres 8.3 documentation.

I couldn't, however, see what the actual difference between them is. I was hoping that my fellow coders might beable to explain, what the pros and cons between them are, and more importantly, the reasons why I would use one over the other?

+2  A: 

GiST indexes are lossy to an extent, meaning that the DBMS has to deal with false positives/negatives, i.e.:

GiST indexes are lossy because each document is represented in the index by a fixed- length signature. The signature is generated by hashing each word into a random bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct. b-trees do not have this behavior, so depending on the data being indexed, there may be some performance difference between the two.

See for text search behavior http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html and http://www.postgresql.org/docs/8.3/static/indexes-types.html for a general purpose comparison.

Dana the Sane
+1  A: 

GiST are more general indexes. You can use them for broader purposes that the ones you would use with B-Tree. Including the ability to build a B-Tree using GiST.

I.E.: you can use GiST to index on geographical points, or geographical areas, something you won't be able to do with B-Tree indexes, since the only thing that matter on a B-Tree is the key (or keys) you are indexing on.

Pablo Santa Cruz
Can you explain in more detail how GiST works better for geographical points, or geographical areas - as this very much falls into what I am using the index for.
Ash
+5  A: 

In a nutshell: B-Tree indexes perform better, but GiST indexes are more flexible. Usually, you want B-Tree indexes if they'll work for your data type. There was a recent post on the PG lists about a huge performance hit for using GiST indexes; they're expected to be slower than B-Trees (such is the price of flexibility), but not that much slower... work is, as you might expect, ongoing.

From a post by Tom Lane, a core PostgreSQL developer:

The main point of GIST is to be able to index queries that simply are not indexable in btree. ... One would fully expect btree to beat out GIST for btree-indexable cases. I think the significant point here is that it's winning by a factor of a couple hundred; that's pretty awful, and might point to some implementation problem.

kquinn
+3  A: 

Basically everybody's right - btree is default index as it performs very well. GiST are somewhat different beasts - it's more of a "framework to write index types" than a index type on its own. You have to add custom code (in server) to use it, but on the other hand - they are very flexible.

Generally - you don't use GiST unless the datatype you're using tell you to do so. Example of datatypes that use GiST: ltree (from contrib), tsvector (contrib/tsearch till 8.2, in core since 8.3), and others.

There is well known, and pretty fast geographic extenstion to PostgreSQL - PostGIS (http://postgis.refractions.net/) which uses GiST for its purposes.

depesz