Background
I have a Spring batch program that reads a file (example file I am working with is ~ 4 GB in size), does a small amount of processing on the file, and then writes it off to an Oracle database.
My program uses 1 thread to read the file, and 12 worker threads to do the processing and database pushing.
I am churning lots and lots and lots of young gen memory, which is causing my program to go slower than I think it should.
Setup
JDK 1.6.18
Spring batch 2.1.x
4 Core Machine w 16 GB ram
-Xmx12G
-Xms12G
-NewRatio=1
-XX:+UseParallelGC
-XX:+UseParallelOldGC
Problem
With these JVM params, I get somewhere around 5.x GB of memory for Tenured Generation, and around 5.X GB of memory for Young Generation.
In the course of processing this one file, my Tenured Generation is fine. It grows to a max of maybe 3 GB, and I never need to do a single full GC.
However, the Young Generation hits it's max many times. It goes up to 5 GB range, and then a parallel minor GC happens and clears Young Gen down to 500MB used. Minor GCs are good and better than a full GC, but it still slows down my program a lot (I am pretty sure the app still freezes when a young gen collection occurs, because I see the database activity die off). I am spending well over 5% of my program time frozen for minor GCs, and this seems excessive. I would say over the course of processing this 4 GB file, I churn through 50-60GB of young gen memory.
I don't see any obvious flaws in my program. I am trying to obey the general OO principles and write clean Java code. I am trying not to create objects for no reason. I am using thread pools, and whenever possible passing objects along instead of creating new objects. I am going to start profiling the application, but I was wondering if anyone had some good general rules of thumb or anti patterns to avoid that lead to excessive memory churn? Is 50-60GB of memory churn to process a 4GB file the best I can do? Do I have to revert to JDk 1.2 tricks like Object Pooling? (although Brian Goetz give a presentation that included why object pooling is stupid, and we don't need to do it anymore. I trust him a lot more than I trust myself .. :) )