Heya,
I'm doing to preface this with the fact I'm a relative Java/Scala newbie so I wouldn't rule out that there is something obvious I'm not doing.
I've got a Scala application which connects via Hibernate to a MySQL database. The Application is designed to process a large amount of data, about 2,750,000 records so I've tried to optimise it as much as possible.
It's running on my workstation which is a QuadCore Intel Xeon with 6Gb of RAM (at 1033Mhz) and it runs nicely and speedily for the first 70k records, completing them in about 15 minutes. By the time, it's got to 90k, it's taken about 25 minutes so something is making it slow to a crawl.
I've checked the timers on the Hibernate code and the database retrieval is taking the same about of time as usual. I've even tried forcing manual Garbage Collection to try and do that but that isn't working either.
The code in question looks something like:
val recordCount = repo.recordCount
val batchSize = 100
val batches = (0 to recordCount by batchSize).toList
val batchJobs = {
for (batchStart <- batches) yield {
future(new RecordFormatter().formatRecords(new Repo(sessionFactory.openSession),batchStart,batchSize)
}
awaitAll(100000,batchJobs: *_)
Inside the RecordFormatter (which isn't actually named that in case you wonder at my naming scheme madness), it does a query for the next 100 records then another query to pull back the actual records (using between on the start and end values) then writes them out to a text file as CSV. Looking at the timer output, each operation within the record formatter takes about 5 seconds to pull back the records and then 0.1 of a second to output it to file.
Despite this once it has slowed down, it is only processing about 12 batches of 100 records per minute as opposed to 40 batches of 100 records per minute when the process first starts.
It's flushing the Session at regular intervals and closing it at the end of each RecordFormatter run (each RecordFormatter has its own session).
I'm mostly looking for any known gotchas with Scala and the Futures. I have noticed that when its slowing down, it doesn't seem to be using all eight possible threads which could certainly explain the drop in speed but it's a mystery to me why it would suddenly stop and always around the 75k record mark.
Thanks!
EDIT: Updated code to show it uses yield and awaitAll in case that makes a difference.