views:

697

answers:

2

Hi

I have written a method insert() in which I am trying to use JDBC Batch for inserting half a million records into a MySQL database:

public void insert(int nameListId, String[] names) {
        String sql = "INSERT INTO name_list_subscribers (name_list_id, name, date_added)"+
                     " VALUES (?, ?, NOW())";
        Connection conn = null;
        PreparedStatement ps = null;

        try{
            conn = getConnection();
            ps = conn.prepareStatement(sql);

            for(String s : names ){
                ps.setInt(1, nameListId); 
                ps.setString(2, s);
                ps.addBatch();
            }

            ps.executeBatch();

        }catch(SQLException e){
            throw new RuntimeException(e);
        }finally{
            closeDbResources(ps, null, conn);
        }
    }

But whenever I try to run this method, I get the following error:

java.lang.OutOfMemoryError: Java heap space
    com.mysql.jdbc.ServerPreparedStatement$BatchedBindValues.<init>(ServerPreparedStatement.java:72)
    com.mysql.jdbc.ServerPreparedStatement.addBatch(ServerPreparedStatement.java:330)
    org.apache.commons.dbcp.DelegatingPreparedStatement.addBatch(DelegatingPreparedStatement.java:171)

If I replace ps.addBatch() with ps.executeUpdate() and remove ps.executeBatch(), it works fine, though it takes some time. Please let me know if you know if using Batch is appropriate in this situation, and if it is, then why does it give OurOfMemoryError?

Thanks

A: 

It is out of memory because it hold all the transaction in memory and only send it over to the database when you call executeBatch.

If you don't need it to be atomic and would like the get better performance, you can keep a counter and call executeBatch every n number of records.

DJ
and what should be the value of n?
craftsman
The value is up to you, you have to benchmark your application to get the best value for that you want for the trade off between memory and performance.
DJ
+3  A: 

addBatch and executeBatch give you the mechanism to perform batch inserts, but you still need to do the batching algorithm yourself.

If you simply pile every statement into the same batch, as you are doing, then you'll run out of memory. You need to execute/clear the batch every n records. The value of n is up to you, JDBC can't make that decision for you. The larger the batch size, the faster things will go, but too large and you'll get memory starvation and things will slow down or fail. It depends how much memory you have.

Start off with a batch size of 1000, for example, and experiment with different values from there.

final int batchSize = 1000;
int count = 0;
for(String s : names ) {
   ps.setInt(1, nameListId); 
   ps.setString(2, s);
   ps.addBatch();

   if (++count % batchSize == 0) {
      ps.executeBatch();
      ps.clearBatch(); //not sure if this is necessary
   }
}
ps.executeBatch();   // flush the last few records.
skaffman