tags:

views:

30

answers:

3

Maybe there's no simple answer to this question, but I ask in case someone has, if not a simple answer, at least an insight.

I've had a number of occasions where I create a loop that goes through many records in a database table performing some update, and where I could legitimately do one big commit at the end, or commit each record as I processed it. i.e. committing one at a time would not create any data integrity issues.

Is there a clear case for which is better?

What brings it to mind is that I had one such program that I recently switched from a single big commit to a bunch of little commits because it was a fairly long running program -- about 80 minutes -- and it failed half way through on bad data. I fixed the problem and re-ran, but then it had to start over again from the beginning when I could have had it just process the previously unprocessed records.

I noticed when I made this change that the run time was about the same either way.

+1  A: 

I think the answer is do you need to rollback all if one fails? If yes, put the transaction outside, otherwise put it inside. Of course I almost never would write a loop to do an update anyway except to process fairly large batches of records. If you are doing row-by-row updates, there are better, more performant methods.

HLGEM
I'm not sure what you mean by "never would write a loop to do an update anyway except to process fairly large batches". Umm, as opposed to what? If I only had one record to update, of course I wouldn't have a loop.
Jay
And yes, if the whole update must be considered as a single transaction that if only part of it was done would result in inconsistent data, of course the commit must be outside. I was thinking of cases where from a logical point of view, it doesn't matter.
Jay
As opposed to a set-based update. If you have 100 records to update, they can 99.9% of the time be done in one update with no loop and should be done that way. Daatbases are not optimized for row-by-row operations.
HLGEM
+1  A: 

Assuming that the ability to rollback the entire persistence is not needed (in which case there is only one answer; commit outside), committing inside the loop keeps the transaction log smaller, but requires more roundtrips to the DB. Committing outside the loop is the exact opposite. Which is faster depends on the average operation count and amount of data to be committed overall. For a routine that persists about 10-20 records, commit outside the loop. For 1m-2m records, I'd commit in batches.

KeithS
A: 

In terms of performance, it is generally better to do one big commit at the end (let network traffic, normally less work for the DB).

This of course depends on many factors, such as the indexing on the table, amount of data etc.

What should be driving your decision is how important each update is - should it be a transaction in and of itself? Does an update of many items make sense? What happens if the loop fails halfway?

Answering those questions will give you the right way to do this in your application for that process - you may arrive at different ways to handle the commit depending on the application context .

Oded