views:

37

answers:

2

I have a relatively simple object model:

ParentObject
    Collection<ChildObject1>
    ChildObject2

The MySQL operation when saving this object model does the following:

  • Update the ParentObject
  • Delete all previous items from the ChildObject1 table (about 10 rows)
  • Insert all new ChildObject1 (again, about 10 rows)
  • Insert ChildObject2

The objects / tables are unremarkable - no strings, rather mainly ints and longs.

MySQL is currently saving about 20-30 instances of the object model per second. When this goes into prodcution it's going to be doing upwards of a million saves, which at current speeds is going to take 10+ hours, which is no good to me...

I am using Java and Spring. I have profiled my app and the bottle neck is in the calls to MySQL by a long distance.

How would you suggest I increase the throughput?

A: 

Deleting any existing ChildObject1 records from the table and then inserting the ChildObject1 instances from the current state of your Parent object seems unnecessary to me. Are the values of the all of the child objects different than what was previously stored?

A better solution might involve only modifying the database when you need to, i.e. when there has been a change in state of the ChildObject1 instances.

Rolling your own persistence logic for this type of thing can be hard (your persistence layer needs to know the state of the ChildObject1 objects when they were retrieved to compare them with the versions of the objects at save-time). You might want to look into using an ORM like Hibernate for something like this, which does an excellent job of knowing when it needs to update the records in the database or not.

matt b
hibernate would be a terrible idea for this, straight JDBC with batching is what you want.
fuzzy lollipop
I am monitoring dirty states with the other objects but can't with the child objects because they are aggregations of something else. These aggregations may be new, modified or deleted and unpicking which is which actually would involve more database work than simply deleting and re-inserting.
MalcomTucker
+1  A: 

You can get some speedup by tracking a dirty flag on your objects (especially your collection of child objects). You only delete/update the dirty ones. Depending on what % of them change on each write, you might save a good chunk.

The other thing you can do is do bulk writes via batch updating on the prepared statement. (Look at PreparedStatement.addBatch()) This can be an order of magnitude faster, but might not be record by record,e.g. might look something like:

  • delete all dirty-flagged children as a single batch command
  • update all parents as a single batch command
  • insert all dirty-flagged children as a single batch command.

Note that since you're dealing with millions of records you're probably not going to be able to load them all into a map and dump them at once, you'll have to stream them into a batch handler and dump the changes to the db 1000 records at a time or so. Once you've done this the actual speed is sensitive to the batch size, you'll have to determine the defaults by trial-and-error.

Steve B.
I thought about this but if i break apart the objects and batch them I potentially have data inconsistency - so if a user queries a parent object after I have deleted the children but before inserting the new ones, they will retrieve an incomplete dataset. That's also going to be unnacceptable .. does that make sense?
MalcomTucker