ansaurus

Question

Stream with a lot of UPDATEs and Postgres

Answer 1

+5 A:

You want to assemble statistical updates as they happen into an in-memory queue of some kind, or alternatively onto a message bus if you're more ambitious. A receiving process then aggregates these statistical updates on a periodic basis - which can be anywhere from every 5 seconds to every hour - depends on what you want. The counts of bytes_received and bytes_sent are then updated, with counts that may represent many individual "update" messages summed together. Additionally you should batch the update statements for multiple ids into a single transaction, ensuring that the update statements are issued in the same relative order with regards to primary key to prevent deadlocks against other transactions that might be doing the same thing.

In this way you "batch" activities into bigger chunks to control how much load is on the PG database, and also serialize many concurrent activities into a single stream (or multiple, depending on how many threads/processes are issuing updates). The tradeoff which you tune based on the "period" is, how much freshness vs. how much update load.

zzzeek 2009-11-02 18:52:07

Answer 2

+4 A:

The database really isn't the best tool for collecting lots of small updates, but as I don't know your queryability and ACID requirements I can't really recommend something else. If it's an acceptable approach the application side update aggregation suggested by zzzeek can help lower the update load significantly.

There is an similar approach that can give you durability and ability to query the fresher data at some performance cost. Create a buffer table that can collect the changes to the values that need to be updated and insert the changes there. At regular intervals in a transaction rename the table to something else and create a new table in place of it. Then in a transaction aggregate all the changes, do the corresponding updates to the main table and truncate the buffer table. This way if you need a consistent and fresh snapshot of any data you can select from the main table and join in all the changes from the active and renamed buffer tables.

However if neither is acceptable you can also tune the database to deal better with heavy update loads.

To optimize the updating make sure that PostgreSQL can use heap-only tuples to store the updated versions of the rows. To do this make sure that there are no indexes on the frequently updated columns and change the fillfactor to something lower from the default 100%. You'll need to figure out a suitable fill factor on your own as it depends heavily on the details of the workload and the machine it is running on. The fillfactor needs to be low enough that allmost all of the updates fit on the same database page before autovacuum has the chance to clean up the old non-visible versions. You can tune autovacuum settings to trade off between the density of the database and vacuum overhead. Also, take into account that any long transactions, including statistical queries, will hold onto tuples that have changed after the transaction has started. See the pg_stat_user_tables view to see what to tune, especially the relationship of n_tup_hot_upd to n_tup_upd and n_live_tup to n_dead_tup.

Heavy updating will also create a heavy write ahead log (WAL) load. Tuning the WAL behavior (docs for the settings) will help lower that. In particular, a higher checkpoint_segments number and higher checkpoint_timeout can lower your IO load significantly by allowing more updates to happen in memory. See the relationship of checkpoints_timed vs. checkpoints_req in pg_stat_bgwriter to see how many checkpoints happen because either limit is reached. Raising your shared_buffers so that the working set fits in memory will also help. Check buffers_checkpoint vs. buffers_clean + buffers_backend to see how many were written to satisfy checkpoint requirements vs. just running out of memory.

Ants Aasma 2009-11-02 20:18:35

ansaurus

tags:

views:

answers:

Stream with a lot of UPDATEs and Postgres

related questions