ansaurus

Question

Answer 1

+1 A:

The solution would be to enable JDBC batching and to flush and clear the EntityManager at regular intervals (the same than the batch size) but I'm not aware of a vendor neutral way to do this:

With Hibernate, you'd have to set the hibernate.jdbc.batch_size configuration option. See Chapter 13. Batch processing
With EclipseLink, it looks like there is a batch writing mode. See Jeff Sutherland's post in this thread (it should also be possible to specify the size).
According to the comments of this blog post, batch writing is not available in TopLink Essentials :(

Pascal Thivent 2010-06-24 02:41:34

thanks for the response! Will post what I've done below!

2010-06-25 15:38:08

Answer 2

+1 A:

Thanks Pascal for the response. I've done some tests and I was able to significantly increase the performance.

With no optimizations i had an insert taking approximately 1100ms. Using eclipselink I added to the persistence.xml:

   <property name="eclipselink.jdbc.batch-writing" value="JDBC"/>
   <property name="eclipselink.jdbc.batch-writing.size" value="1000"/>

I tried the other properties (Oracle-JDBC etc) but JDBC appeared to give the best performance increase. That brought the insert down to approximately 900ms. So a fairly modest performance increase of 200ms. A big savings came from increasing the sequence allocationSize. I'm not a huge fan of doing this. I find it dirty to increase the INCREMENT BY of my sequences just to accommodate JPA. Increasing these brought the time down to approximately 600ms for each insert. So a total of about 500 ms were shaved off with those enhancements.

All this is fine and dandy, but it's still significantly slower than JDBC batch. JPA is a pretty high price to pay for ease of coding.

2010-06-25 16:22:09

Thanks for the feedback. I should have noticed the `allocationSize`. +1

Pascal Thivent 2010-06-25 16:27:34

Answer 3

+1 A:

Curious why you find increasing the INCREMENT BY as dirty? It is an optimization that reduces the number of calls to the database to retrieve the next sequence value and is a common pattern used in database clients where the id value is assigned in the client prior to INSERT. I don't see this as a JPA or ORM issue and should be the same cost in your JDBC comparison since it must also retrieve a new sequence number for each new row prior to INSERT. If you have a different approach in your JDBC case then we should be able to get EclipseLink JPA to follow the same approach.

The cost of JPA is probably most obvious on the isolated INSERT scenario because you are not gaining any benefit from repeated reads on a transactional or shared cache and depending on your cache configuration you are paying a price to put these new entities into the cache within the flush/commit.

Please note that there is also a cost to creating the first EntityManager where all of the metadata processing, class-loading, possibly weaving, and metamodel initialization. Make sure you keep this time out of your comparison. In your real application this occurs once and all subsequent EntityManager benefit from the shared metadata.

If you have other scenarios that need to read these entities then the cost of putting them in the cache can reduce their cost of retrieving them. In my experience I can make an application overall much faster then a typical hand-written JDBC solution but its a balance across the entire set of concurrent users and not on an isolated test case.

I hope this helps. Happy to provide any more guidance and EclipseLink JPA and its performance and scalability options.

Doug

Doug Clarke 2010-07-05 14:54:27

ansaurus

tags:

views:

answers:

JPA inserts slow with an object graph

related questions