I have a web application that receives messages through an HTTP interface, e.g.:
http://server/application?source=123&destination=234&text=hello
This request contains the ID of the sender, the ID of the recipient and the text of the message.
This message should be processed like:
- finding the matching User object for both the source and the destination from the database
- creating a tree of objects: a Message that contains a field for the message text and two User objects for the source and the destination
- persisting this tree to a database.
The tree will be loaded by other applications that I can't touch.
I use Oracle as the backing database and JPA with Toplink for the database handling tasks. If possible, I'd stay with these.
Without much optimization I can achieve ~30 requests/sec throughput in my environment. That's not much, I'd require ~300 requests/sec. So I measured where the performance bottleneck is and found that the calls to em.persist()
takes most of the time. If I simply comment out that line, the throughput go well over 1000 requests/sec.
I tried to write a small test application that used simple JDBC calls to persist 1 million messages to the same database. I used batching, meaning I did 100 inserts then a commit, and repeated until all the records was in the database. I measured ~500 requests/sec throughput in this scenario, that would meet my needs.
It is clear that I need to optimize insert performance here. However as I mentioned earlier I would like to keep using JPA and Toplink for this, not pure JDBC.
Do you know a way to create batch inserts with JPA and Toplink? Can you recommend any other technique for improving JPA persist performance?
ADDITIONAL INFO:
"requests/sec" means here: total number of requests / total time from beginning of test to last record written to database.
I tried to make the calls to em.persist()
asynchronous by creating an in-memory queue between the servlet stuff and the persister. It helped the performance greatly. However the queue did grow really fast and as the application will receive ~200 requests/second continuously, It is not an acceptable solution for me.
In this decoupled approach I collected requests for 100 msec and called em.persist()
on all collected items before commiting the transaction. The EntityManagerFactory is cached between each transaction.