I have a standalone app developed in Spring and Hibernate.
I need to update a pretty large dataset and right now the speed of the update makes it unusable. Looking for options to implement this more efficiently. And I realize Hibernate isnt the tool to handle large batch updating but I need to make it work for now.
There are 3 tables, KEYWORD, MOVIE and REVIEW. There is a 1-n relationship between Keyword to Movie & Review.
Right now I:
a. query the Keyword table (~4mill rows), paging at 50 rows per resultset
b. check if that keyword exists in the movie table (by doing a like on the description field)
- attach the collection retrieved to the Keyword object
c. check if that keyword exists in the review table (by doing a like on the text field)
- attach the collection retrieved to the Keyword object
d. update by calling saveOrUpdateAll
e. call session flush and clear.
f. move onto the next 50
I also have an index on keyword(word), movie(description) and review(review_text) fields.
I figure query cache or 2nd level cache wont help me since the keywords are unique.
Just to be clear, this is a single thread batch process using spring HibernateTemplate. Right now each set of 50 keywords are taking about 30 mins or more to update.