views:

479

answers:

3

I have a java program that in some circumstances must update a large amount of records in a database (e.g 100,000).

The way it does it is by creating a PreparedStatement and by using the addBatch technique. Here is the snippet:

connection.setAutoCommit(false);
PreparedStatement ps = connection.prepareStatement(
        "UPDATE myTable SET colName=? where id=?");

for (...) { // this loop can be 100000 long 
     colValue = ...
     id = ...
     ps.setString(1,colValue);
     ps.setString(2,id);
     ps.addBatch();
  }

ps.executeBatch();
connection.commit();

is this the best (fastest) way to update 100000 of records in JDBC ?

Could anybody suggest a better way ?

A: 

You should use Spring Batch operations with the JdbcTemplate

javaExpert
But he doesn't want to use Spring
Dan
But maybe he should, Spring is all the rage these days
javaExpert
+2  A: 

Try this as a benchmark:

  1. Use the built-in SQL tools to do a bulk extract of the entire table. All rows. All columns.

  2. Drop (or rename) the table.

  3. Use a simple flat-file read/write to create a new file with the updates applied.

  4. Use the bulk-load utility that comes with your database to rebuild the entire table from the extracted file.

  5. Add indexes after the reload.

You may find that this is faster than any SQL solution. We stopped using UPDATES for a data warehouse because extract -> flat file process -> load was much faster than SQL.

S.Lott
It may be a rubbish answer but there are nicer ways of saying that.
Dan
@dan: you are right, Dan. I was too [email protected]: thanks for your unhelpful answer. I wonder how you manage to get so many points in StackOverflow. Surely the guys at StackOverflow should consider reworking their algorithms
zerohibernation
Clearly S.Lott thinks you can get closer to an answer if you do a benchmark. Maybe you asked because you didn't know how to do it, because you hadn't time neither to learn enough about your DB. If there's a definitive answer someone will write it. If there's not, you'll have one not-so-helpful answer. And know what? I'm voting it because it encourage self-learning and understanding beyond a simple, questionable answer. We are programmers. We make things work.
helios
@zerohibernation: (1) It worked for me. (2) It may not work for you. (3) I'm not sure how much more detail you need. Code? (4) If you don't "benchmark", all you're doing is taking my word for it. (5) You can ignore my answer politely.
S.Lott
@s.lott: I take it back. His answer is not too bad. Sincere apologies.@helios: you say:I'm voting it because it encourage self-learning and understanding beyond a simple, questionable answer.so if somebody replies: 'go and find it out yourself' you could vote them again.
zerohibernation
@zerohibernation: I don't think I said go and find out for yourself. I think I suggested benchmarking the algorithm I described. I'm not sure, but there seems to be a difference. Perhaps my answer was unclear?
S.Lott
+1 for measuring, whether drop and rebuild of indexes increases performance depends on the number of rows already in the table and if they have to be preserved
stacker
A: 

Since batching uses buffering on client side and then sends everything as a single request, it might be wise to execute batches with 5000 rows. You should watch you memory consumption when adding 100.000 rows.

Sometime it works faster to push data in several loads instead of 1 single load(using JDBC, at least based on my previous experience).

adrian.tarau