views:

90

answers:

1

I've been reading a couple of docs regarding postgres memory allocation configuration but need a little help.

my process runs thousands of SELECT SUM(x) FROM tbl WHERE ??? type queries, some of which take 10-30 seconds to run. the combined total for these queries is multiple days in some cases.

besides that, i have a few statements which populate summary tables from raw tables in the form of: INSERT INTO sum_tbl SELECT FROM raw_tbl WHERE ??? GROUP BY ???

since the raw tables are quite large (up to 40 million rows) these summary queries can take hours to run.

the process is single threaded, so only one query is running at a time, however i do run up to 4 processes at a time meaning there will only be 1-4 concurrent connections.

the computer has 3.2 GB ram, most of which is free.

with all that in mind, which of the memory settings are most applicable?

my understanding is that work_mem looks like the most relevant.

thanks.

EDIT - add analyze of query which similar queries are run 1053000 times

"Nested Loop  (cost=0.00..62869.18 rows=6 width=17) (actual time=1812.655..32761.845 rows=30 loops=1)"
"  ->  Seq Scan on table2 tt  (cost=0.00..1103.74 rows=2212 width=5) (actual time=0.017..29.231 rows=1527 loops=1)"
"        Filter: (pp AND (dd = 1200::numeric) AND ((type)::text = 'setup'::text))"
"  ->  Index Scan using idx_table1 on table1 t  (cost=0.00..27.91 rows=1 width=20) (actual time=21.432..21.432 rows=0 loops=1527)"
"        Index Cond: ((t.t_id = tt.id) AND (t.st = 520000::numeric) AND (t.ta = 2300000::numeric))"
"        Filter: (date_trunc('month'::text, t.start) = '2004-03-01 00:00:00'::timestamp without time zone)"
"Total runtime: 32761.983 ms"
+1  A: 
  1. work_mem is relevant. however, keep in mind it's per connection.
  2. make sure you have needed indexes in place.
  3. explain analyze is your friend. can you post it here?
  4. disk setup (physical) is important here. if you can use more spindles, use them.
  5. best docs are, as always, here and here (if you write a lot)
  6. (feel brave) are you sure you need an RDBMS here?

edit:

After some thought and discussion in comments below, only reasonable advice except "throw more hardware" is "data warehouse", which means:

  • Build a data mart - with one or more aggregate tables - which will satisfy your reporting needs

  • Establish ETL processes to allow incremental data mart updates.

(but I'm not able to say if it's possible basing on your problem description - this needs deeper analysis, especially the business meaning of these queries)

filiprem
Just to be perfectly clear - work_mem is *not* per connection, it's per *operation*. So there may be more than 1 x work_mem per connection.
Magnus Hagander
1. have bumped work_mem up to 10Mb, 2. i think i have, 3. added in question above, 4. we are working on getting access to new hardware but are currently on a single SATA disk, 5. thx, 6. wtf? pretty sure i don't have an alternative.
pstanton
is it possible that increasing work_mem slows processing even if it doesn't reach swapping point? i'm only using a 3rd of my memory but if anything i've noticed a slowdown!
pstanton
increasing work_mem should not slow processing.how big and how old is this database? which Pg version is it? did you try REINDEX?
filiprem
postgres 8.3. the data is very new and i have run ANALYZE on all tables prior to reporting. should i run REINDEX too? as stated, some tables contain 40 million records.
pstanton
just read the doc for REINDEX - i don't think it will help as the index hasn't changed and all data is created after the index is created.
pstanton