views:

484

answers:

2

I have 5 MySQL InnoDB tables: Test,InputInvoice,InputLine,OutputInvoice,OutputLine and each is mapped and functioning in Hibernate. I have played with using StatelessSession/Session, and JDBC batch size. I have removed any generator classes to let MySQL handle the id generation- but it is still performing quite slow. Each of those tables is represented in a java class, and mapped in hibernate accordingly. Currently when it comes time to write the data out, I loop through the objects and do a session.save(Object) or session.insert(Object) if I'm using StatelessSession. I also do a flush and clear (when using Session) when my line count reaches the max jdbc batch size (50).

  1. Would it be faster if I had these in a 'parent' class that held the objects and did a session.save(master) instead of each one?
  2. If I had them in a master/container class, how would I map that in hibernate to reflect the relationship? The container class wouldn't actually be a table of it's own, but a relationship all based on two indexes run_id (int) and line (int).
  3. Another direction would be: How do I get Hibernate to do a multi-row insert?
+3  A: 

The ID generation strategy is critical for batch insertion in Hibernate. In particular, IDENTITY generation will usually not work (note that AUTO typically maps to IDENTITY as well). This is because during batch insert Hibernate has a flag called "requiresImmediateIdAccess" that says whether or not generated IDs are immediately required or not; if so, batch processing is disabled.

You can easily spot this in the DEBUG-level logs when it says "executing identity-insert immediately" - this means it has skipped batch processing because it was told that generated IDs are required immediately after insertion.

Generation strategies that typically do work are TABLE and SEQUENCE, because Hibernate can pre-generate the IDs, thereby allowing for batch insertion.

A quick way to spot whether your batch insertion works is to activate DEBUG-level logs as BatchingBatcher will explicitly tell you the batch size it's executing ("Executing batch size: " + batchSize ).

Additionally, the following properties are important for achieving batch insertion. I daren't say they're required as I'm not enough of a Hibernate-expert to do so - perhaps it's just my particular configuration - but in my experience they were needed nonetheless:

hibernate.order_inserts = true
hibernate.order_updates = true

These properties are pretty poorly documented, but I believe what they did was enable for the SQL INSERT and UPDATE statements to be properly grouped for batch execution; I think this might be the multi-row inserts you're after. Don't shoot me if I'm wrong on this, I'm recalling from memory.

I'll also go ahead and assume that you set the following property; if not, this should serve as a reminder:

hibernate.jdbc.batch_size = xx

Where xx is your desired batch size, naturally.

voetsjoeba
@JDR: Thank you very much for all this info- I am in the process of changing my ID generation to native (will this disable batching as well?), and will add your other ideas. I already have jdbc.batch_size = 50 but don't have order_inserts/updates. I'll let you know how well it works.
manyxcxi
@JDR: I implemented your suggestions and I can see that it is grouping together my inserts so all the tables are getting inserted in order. It is inserting ALL the Test, then ALL etc..It is showing me that the batch size is 50- which is good.But I'm still getting roughly the same performance I was getting before- which is approximately 5 minutes for 3300 records (keep in mind, each record means an one insert into 5 different table).
manyxcxi
@manyxcxi: Changing the IDs to native disabled batching for me due to the requiresImmediateIdAccess variable (using H2 and Hibernate 3.5.1-Final). If the batcher says that it's performing batches of 50 then that's pretty much what it's doing, it's sort of the goal here. Five minutes for 3300 records seems a bit excessive, I can insert 1000 records in about a second with even some additional Hibernate Search overhead (albeit on an embedded H2 database). Do you have some sample output logs anywhere I can look at? What database are you running and what connection pool are you using?
voetsjoeba
@voetsjoeba: I'm using MySQL (the tables are InnoDB), and I misspoke. The 5 minutes is how long it takes for a total run. There is a lot of processing overhead, as a matter of fact I would hazard a guess that as much as 90% is overhead that I can't control (waiting for response from a separate calculation engine).The general idea is: Read 500 lines at a time out of CSV, turn into program specific objects, send to calculator, handle response objects, turn into database objects, write back the batch.I'll try and clean up the logs for show
manyxcxi
@voetsjoeba: I forgot to mention, I deviated from the plan and didn't use a generator. My java code assigns each one an id so Hibernate doesn't have to handle it.
manyxcxi
A: 

The final solution for me was to use voetsjoeba's response as a jumping off point. My hibernate config uses the following options:

hibernate.order_inserts = true
hibernate.order_updates = true
  • I changed from using Session to StatelessSession

  • Re-ordered the Java code to process all the elements in a batch a table at a time. So all of table x, then table y, etc.

  • Removed the <generator> from each class. Java now create its and assigns it to the object

  • Created logic that allowed me to determine if just an id was being set and not write 'empty' lines to the database

  • Finally, I turned on dynamic-insert for my classes in their hibernate definitions like so: <class name="com.my.class" table="MY_TABLE" dynamic-insert="true">

manyxcxi