I have been tasked with improving the performance of a slow running process which updates some data in a PostGres 8.3 database (running on Solaris, updates are driven by Perl 5.8 scripts through SOAP). About 50% of the time consumed I have very little control over so tuning my 50% is quite important.
There are usually about 4,500,000 rows in the table although I've seen it bloat out to about 7,000,000. The id that the update is querying on (not primary or unique) has just under 9000 distinct values and the spread of occurrences is weighted heavily towards 1 per id (median value is 20, max value 7000).
There is an index on this id but with such sparse data I wonder if there's a better way of doing things. I'm also considering denormalising things a bit (database is not super-normalised anyway) & pulling data out into a separate table (probably controlled/maintained by triggers) to help speed things up.
So far, I have made some pretty basic tweaks (not pinging the database every n seconds to see if it's alive, not setting session variables unnecessarily etc) and this is helping but I really feel that there's something I'm missing with the data ...
Even if someone says that pulling relevant data out into a separate table is an excellent/terrible idea that would be really helpful! Any other ideas (or further questions for clarification) gratefully received!
Query:
UPDATE tab1 SET client = 'abcd', invoice = 999
WHERE id = 'A1000062' and releasetime < '02-11-09'::DATE
AND charge IS NOT NULL AND invoice IS NULL AND client IS NULL;
I realise the 'is not null' is far from ideal. Id is indexed as are invoice & client (btrees, so I understand PostGres will/should/can use an index there). It's a pretty trivial query ...
Query plan (explain with analyze):
Bitmap Heap Scan on tab1 (cost=17.42..1760.21 rows=133 width=670) (actual time=0.603..0.603 rows=0 loops=1)
Recheck Cond: (((id)::text = 'A1000062'::text) AND (invoice IS NULL))
Filter: ((charge IS NOT NULL) AND (client IS NULL) AND (releasetime < '2009-11-02'::date))
-> Bitmap Index Scan on cdr_snapshot_2007_09_12_snbs_invoice (cost=0.00..17.39 rows=450 width=0) (actual time=0.089..0.089 rows=63 loops=1)
Index Cond: (((snbs)::text = 'A1000062'::text) AND (invoice IS NULL))
Total runtime: 0.674 ms
Autovacuum is, I believe, enabled. There are no foreign key constraints but thanks for the tip as I didn't know that.
I am really liking the idea of increasing the statistics value - I'll be having a play around with that straight away.