views:

88

answers:

3

I'm having trouble getting Hibernate to perform a bulk insert on MySQL.

I'm using Hibernate 3.3 and MySQL 5.1

At a high level, this is what's happening:

@Transactional
public Set<Long> doUpdate(Project project, IRepository externalSource) {
    List<IEntity> entities = externalSource.loadEntites();
    buildEntities(entities, project);
    persistEntities(project);
}
public void persistEntities(Project project) {
     projectDAO.update(project);
}

This results in n log entries (1 for every row) as follows:

Hibernate: insert into ProjectEntity (name, parent_id, path, project_id, state, type) values (?, ?, ?, ?, ?, ?)

I'd like to see this get batched, so the update is more performant. It's possible that this routine could result in tens-of-thousands of rows generated, and a db trip per row is a killer.

Why isn't this getting batched? (It's my understanding that batch inserts are supposed to be default where appropriate by hibernate).

+2  A: 

As documented in the Chapter 13. Batch processing:

If you are undertaking batch processing you will need to enable the use of JDBC batching. This is absolutely essential if you want to achieve optimal performance. Set the JDBC batch size to a reasonable number (10-50, for example):

hibernate.jdbc.batch_size 20

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

Don't forget to flush and then clear the session regularly or you'll get OutOfMemoryException as documented in 13.1. Batch inserts.

But IMO, for tens-of-thousands of rows, you should consider using the StatelessSession interface.

Pascal Thivent
Will this enable batching of SQL-statements? (Whether the sessions keeps state sounds rather orthogonal to batching, but perhaps I infer too much from the class name)
meriton
@meriton I've updated my answer to add more details.
Pascal Thivent
+1: Now I get it, thanks!
meriton
A: 

Pascal has pretty much nailed it in hibernate context. As an alternative, you can use jbdc template's Batchsqlupdate. However I have to warn you that hibernate cached instances may not reflect the changes done using above. In our project, we had to take precautionary measures to overcome this, by creating a different schedule (another problem created, but within our control)

questzen
+1  A: 

Pascal's answer is correct. However, because you are using MySQL, I also highly recommend that you try using the rewriteBatchedStatements=true parameter in your JDBC URL.

This parameter causes the JDBC driver to dynamically re-write your INSERT batches to use a single "multi-valued" INSERT, e.g.:

INSERT INTO mytable (mycol) VALUES (0);
INSERT INTO mytable (mycol) VALUES (1);
INSERT INTO mytable (mycol) VALUES (2);

will be re-written to:

INSERT INTO mytable (mycol) VALUES (0), VALUES (1), VALUES (2);

This can make a significant difference in some cases. See http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for for some example measurements.

Matt Solnit