views:

41

answers:

2

I have one procedure which updates record values, and i want to fire it up against all records in table (over 30k records), procedure execution time is from 2 up to 10 seconds, because it depends on network load.

Now i'm doing UPDATE table SET field = procedure_name(paramns); but with that amount of records it takes up to 40 min to process all table.

Now im using 4 different connections witch fork to background and fires query with WHERE clause set to iterate over modulo of row id's to speed this up, ( WHERE id_field % 4 = ) and this works well and cuts down table populate to ~10 mins.

But i want to avoid using cron, shell jobs and multiple connections for this, i know that it can be done with libpq, but is there a way to fire up a query (4 different non-blocking queries) and do not wait till it ends execution, within single connection?

Or if anyone can point me out to some clues on how to write that function, using postgres internals, or simply in C and bound it as a stored procedure?

Cheers Darius

A: 

PHP has some functions for asynchrone queries:

  • pg_ send_ execute()
  • pg_ send_ prepare()
  • pg_send_query()
  • pg_ send_ query_ params()

No idea about other programming languages, you have to dig into the manuals.

Frank Heikens
Problem is that i do not use PHP within this project, DB itself is an app with all the core and business logic is build on top of plpgsql and plperl
canni
A: 

I've got a sure answer for this question - IF you will share with us what your ab workout is!!! I'm getting fat by the minute and I need answers myself...

OK I'll answer anyway.

If you are updating one table, on one database server, in 40 minutes 'single threaded' and in 10 minutes with 4 threads, the bottleneck is not the database server; otherwise, it would get bogged down in I/O. If you are executing a bunch of UPDATES, one call per record, the network round-trip time is killing you.

I'm pretty sure this is the case and not that it's either an I/O bottleneck on the DB or the possibility that procedure_name(paramns); is taking a long time. (If that were the procedure taking 2-10 seconds it would take like 2500 min to do 30K records). The reason I am sure is that starting 4 concurrent processed cuts the time in 1/4. So especially it is not an i/o issue on the DB server.

This might be the one excuse for putting business logic in an SP on the server. Optimization unfortunately means breaking the rules. The consequence is difficult maintenance. but, duh!!

However, the best solution would be to get this set up to use 'bulk update' queries. That might mean you have to take several strange and unintuitive steps such as this:

  • This will require a lot of modfication if multiple users can run it concurrently.
  • refactor the system so procedure_name(paramns) can get all the data it needs to process all records via a select statement. May need to use creative joins. If it's an SP of course now you are moving the logic to the client.
  • Use that have the program create an XML or other importable flat file format with the PK of the record to update, and the new field value or values. Write all the updates to this file instead of executing them on the DB.
  • have a temp table on the database that matches the layout of this flat file
  • run an import on the database - clear the temp table and import the file
  • do an update of a join of the temp table and the table to be updated, e.g., UPDATE mytbl, mytemp WHERE myPK=mytempPK SET myval=mytempnewval (use the right join syntax of course).
  • You can try some of these things 'by hand' first before you bother coding, to see if it's worth the speed increase.
  • If possible, you can still put this all in an SP!

I'm not making any guarantees, especially as I look down at my ever-fattening belly, but, this has the potential to melt your update job down to under a minute.

FastAl
reason of procedure long execution time is that it uses plperl function witch communicate over network using SNMP or raw sockets to return a result and table for end users is read-only, only internal DB stored procedures have access to update data, so it can be predicted when and how data will be updated
canni