views:

672

answers:

4

I have a utility in my application where i need to perform bulk load of INSERT, UPDATE & DELETE operations. I am trying to create transaction around this so that once this system is invoke and the data is fed to it, it is ensured that it is either all or none added to the database.

The concern what is have is what is the boundary conditions here? How many INSERT, UPDATE & DELETE can i have in one transaction? Is transaction size configurable?

Any help would be appreciated.

-Thanks

+2  A: 

I don't think there's a maximum amount of work that can be performed in a transaction. Data keeps getting added to the table files, and eventually the transaction either commits or rolls backs: AIUI this result gets stored in pg_clog; if it rolls back, the space will eventually be reclaimed by vacuum. So it's not as if the ongoing transaction work is held in memory and flushed at commit time, for instance.

araqnid
This is only partially correct. Inside each transaction is a commandcounter that deals with visibility inside the transaction. This is a 32-bit number that will eventually overflow if you have a very large transaction (billions of commands).VACUUM, pg_clog etc only deals with total number of transactions in the system, not what happens inside one of them.
Magnus Hagander
A: 

For a project I work on, I perform 20 millions of INSERT. I tried with one big transaction and with one transaction for every million of INSERT and the performances seem exactly the same.

PostgreSQL 8.3

bortzmeyer
A: 

I believe the maximum amount of work is limited by your log file size. The database will never allow itself to not be able to rollback, so if you consume all your log space during the transaction, it will halt until you give it more space or rollback. This is a generally true for all databases.

I would recommend chunking your updates into manageable chunks that take a most a couple of minutes of execution time, that way you know if there's a problem earlier (eg what normally takes 1 minute is still running after 10 minutes... hmmm, did someone drop an index?)

Glen
This is not true for PostgreSQL. We can recycle log space during a running transaction. If you are doing archive logging, you will obviously need space in the archive location, but for the local transaction log it's not necessary. (you will need the actual disk space for the data on disk, of course).
Magnus Hagander
A: 

A single transaction can run approximately two billion commands in it (2^31, minus IIRC a tiny bit of overhead. Actually, come to think of it, that may be 2^32 - the commandcounter is unsigned I think).

Each of those commands can modify multiple rows, of course.

Magnus Hagander