tags:

views:

40

answers:

2

I have BIG table with multiple indexes in postgres. It has indexes on db_timestamp, id, username.

I want to find the MAX timestamp for particular username. The problem is the simple query like

SELECT MAX(db_timestamp) FROM Foo WHERE username = 'foo'

takes so much time because of the huge table size( we are talking 450GB table with over 30 GB index sizes).

Is their any way to optimize this query or tell postgres about what query plan to use?

+1  A: 

Postgresql can't use the index on (db_timestamp,id,username) to satisfy that query- the query term you're after has to be a prefix of the index, i.e. using the first column(s).

So an index on (username,db_timestamp) would serve that query very well, since it just has to scan the subtree (username,0)..(username,+inf) (and iirc Postresql should actually know to try and find (username,+inf) and walk backwards in-order).

In general, "covering indices" isn't a useful technique with Postgresql like it is with other databases, due to Postgresql's need to refer to the heap tuples for visibility information.

araqnid
+1  A: 

Use create an index on username and db_timestamp with correct sort order:

CREATE INDEX idx_foo ON foo (username ASC, db_timestamp DESC);

Check EXPLAIN to see if things work as they should.

Frank Heikens
if I add such index and get rid of the individual indexes, does this going to affect the other queries, only involving lets say username or db_timestamp individually?
Sujit
It might, I have no idea what kind of queries you execute. Any query that might benefit from an index on just db_timestamp (or starting with db_timestamp) will not use the above index. Only queries starting with a condition or sort order on username can also benefit from the db_timestamp in the index. Check EXPLAIN to see how your queries are executed.
Frank Heikens