views:

148

answers:

7

Background

I have a Spring batch program that reads a file (example file I am working with is ~ 4 GB in size), does a small amount of processing on the file, and then writes it off to an Oracle database.

My program uses 1 thread to read the file, and 12 worker threads to do the processing and database pushing.

I am churning lots and lots and lots of young gen memory, which is causing my program to go slower than I think it should.

Setup

JDK 1.6.18
Spring batch 2.1.x
4 Core Machine w 16 GB ram

-Xmx12G 
-Xms12G 
-NewRatio=1 
-XX:+UseParallelGC
-XX:+UseParallelOldGC

Problem

With these JVM params, I get somewhere around 5.x GB of memory for Tenured Generation, and around 5.X GB of memory for Young Generation.

In the course of processing this one file, my Tenured Generation is fine. It grows to a max of maybe 3 GB, and I never need to do a single full GC.

However, the Young Generation hits it's max many times. It goes up to 5 GB range, and then a parallel minor GC happens and clears Young Gen down to 500MB used. Minor GCs are good and better than a full GC, but it still slows down my program a lot (I am pretty sure the app still freezes when a young gen collection occurs, because I see the database activity die off). I am spending well over 5% of my program time frozen for minor GCs, and this seems excessive. I would say over the course of processing this 4 GB file, I churn through 50-60GB of young gen memory.

I don't see any obvious flaws in my program. I am trying to obey the general OO principles and write clean Java code. I am trying not to create objects for no reason. I am using thread pools, and whenever possible passing objects along instead of creating new objects. I am going to start profiling the application, but I was wondering if anyone had some good general rules of thumb or anti patterns to avoid that lead to excessive memory churn? Is 50-60GB of memory churn to process a 4GB file the best I can do? Do I have to revert to JDk 1.2 tricks like Object Pooling? (although Brian Goetz give a presentation that included why object pooling is stupid, and we don't need to do it anymore. I trust him a lot more than I trust myself .. :) )

+2  A: 

I think a session with a memory profiler will shed a lot of light on the subject. This gives a nice overview how many objects are created and this is somtimes revealing.

I am always amazed how many strings are generated.

For domain objects crossreferencing them is also revealing. If you see suddenly 3 times more objects from a derived object than from the source then there something going on there.

Netbeans has a nice one built it. I used JProfiler in the past. I think if you bang long enough on eclipse you can get the same info from the PPTP tools.

Peter Tillemans
Does jvisualvm (usable with this question; it's Java6) help with identifying these problems?
Donal Fellows
Good ideas, I will try the neatbeans profiler and jvisulvm. I am an eclipse guy but never had a ton of luck with PPTP
bwawok
So.. 90% of my total memory is in char[] in "oracle.sql.converter.toOracleStringWithReplacement" So that narrows it down, but not sure how to narrow it down further, or if something like the flyweight pattern would reduce memory here.
bwawok
+1  A: 

In my opinion, the young generation should not be equally big as the old generation, so that the small garbage collections stay fast.

Do you have many objects that represent the same value? If you do, merge these duplicate objects using a simple HashMap:

public class MemorySavingUtils {

    ConcurrentHashMap<String, String> knownStrings = new ConcurrentHashMap<String, String>();

    public String unique(String s) {
        return knownStrings.putIfAbsent(s, s);
    }

    public void clear() {
        knownStrings.clear();
    }
}

With the Sun Hotspot compiler, the native String.intern() is really slow for large numbers of Strings, that's why I suggest to build your own String interner.

Using this method, strings from the old generation are reused and strings from the new generation can be garbage collected quickly.

Roland Illig
Only worthwhile if you've got repetition of strings, especially within a batch. Otherwise you're not helping. (And don't use `String.intern` at all unless you *know* it is useful in the specific case you're dealing with; interning is an optimization…)
Donal Fellows
1) I have tried a newRatio of 2 (the default), as well as 4 and 6. None of it helped. My GCs were slightly faster, but happened more often. 10 GCs of 5GB each seem to take just about as long as 100 GCs of 500MB each (I think the bigger GCs may have benchmarked slightly faster)
bwawok
2) No strings should be a duplicate, or at least not very many of them. I know a few parts of the file that are one of 3 possible choices... I could specifically do an intern on those. Not sure if this is a micro-optimization though. Not worried about some church, just 10x the amount of my data set
bwawok
+3  A: 

It would be really usefull if you clarify your terms "young" and "tentured" generation because Java 6 has a slightly different GC-Model: Eden, S0+S1, Old, Perm

Have you experimented with the different garbage collection algorithms? How has "UseConcMarkSweepGC" or "UseParNewGC" performed.

And don't forget simply increasing the available space is NOT the solution, because a gc run will take much longer, decrease the size to normal values ;)

Are you sure you have no memory-leaks? In a consumer-producer-pattern - you describe - rarely seldom data should be in the Old Gen because those jobs are proccessed really fast and then "thrown away", or is your work queue filling up?

You should defintely observe your program with a memory analyzer.

Tobias P.
I wouldn't use `UseConcMarkSweepGC` here, response time is not important for batch processing (see http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting). Anyway, I don't think the problem is the GC algorithm.
Pascal Thivent
I tried conr mark sweep, and I lost about 10% of my performance. I agree it's not great for batch processing.
bwawok
I was using the terms young and tenured to refer to new and old returned to me by jmap -heap. I likely blew the terminology somewhere. I have 16 GB of ram to use, if I can go from 2GB of memory to 12GB of memory and get a 5-10% speedup, it is well worth it. Not sure I see a good reason to bring down the memory. I trade 10 slow GCs for 100 fast GCs... but spend the same time in GC. I think I need to reduce church and not my newgen size to increase my speed...
bwawok
As to the memory leak issue. Could be, but don't think that is causing my problem. I cache 1-2 GB of data before my batch process, so 3-3.5 GB sitting in old gen is not a problem for me. My work queue is filling up, but it is bounded with a java.util.concurrent.BlockingQueue, so I make sure no more than ~10% of the file is in memory at any given point in time.
bwawok
+1  A: 

Read a line from a file, store as a string and put in a list. When the list has 1000 of these strings, put it in a queue to be read by worker threads. Have said worker thread make a domain object, peel a bunch of values off the string to set the fields (int, long, java.util.Date, or String), and pass the domain object along to a default spring batch jdbc writer

if that's your program, why not set a smaller memory size, like 256MB?

irreputable
a) I precache a hashmap of data, that is around 1-2 GB of data (hence the stuff that lives in old gen). b) I have lots of memory and 16 threads, this program has the entire server to run on, not worried about "wasting" memory
bwawok
Just because there is no other process running on that server doesn't mean your program should allocate all the memory. You should give it only as much memory as it needs, and a little extra for unexpected circumstances. That way, the garbage collector doesn't have to keep objects longer than necessary.
Roland Illig
people say that GC performs very badly on a heap beyond a couple of GB. I don't understand why - GC works on live objects only, so why does it matter how many dead objects there are - but that what people say.
irreputable
+1  A: 

I'm guessing with a memory limit that high you must be reading the file entirely into memory before doing the processing. Could you consider using a java.io.RandomAccessFile instead?

Ceilingfish
Actually I am not. In order to avoid "wasting" memory, I use a java.util.concurrent.BlockingQueue.. I keep just enough of the file read to keep all the workers busy, but I don't ever have more than about 10% of the file in memory at the same time. In theory I will scale to much bigger files, in the 10-30GB range, and def can not fit all of that in memory.
bwawok
+2  A: 

You need to profile your application to see what is happening exactly. And I would also try first to use the ergonomics feature of the JVM, as recommended:

2. Ergonomics

A feature referred to here as ergonomics was introduced in J2SE 5.0. The goal of ergonomics is to provide good performance with little or no tuning of command line options by selecting the

  • garbage collector,
  • heap size,
  • and runtime compiler

at JVM startup, instead of using fixed defaults. This selection assumes that the class of the machine on which the application is run is a hint as to the characteristics of the application (i.e., large applications run on large machines). In addition to these selections is a simplified way of tuning garbage collection. With the parallel collector the user can specify goals for a maximum pause time and a desired throughput for an application. This is in contrast to specifying the size of the heap that is needed for good performance. This is intended to particularly improve the performance of large applications that use large heaps. The more general ergonomics is described in the document entitled “Ergonomics in the 5.0 Java Virtual Machine”. It is recommended that the ergonomics as presented in this latter document be tried before using the more detailed controls explained in this document.

Included in this document are the ergonomics features provided as part of the adaptive size policy for the parallel collector. This includes the options to specify goals for the performance of garbage collection and additional options to fine tune that performance.

See the more detailed section about Ergonomics in the Java SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning guide.

Pascal Thivent
Good idea, I will give ergonomics a choice and compare the results with what I have. However I know that by default it starts with a very small heap, and does gc, gc, grow heap, gc, gc, grow heap, gc, gc, grow heap... it is totally crappy. I think I shaved significant time off my run by starting XMS and XMX at the desired size.
bwawok
@bwawok: I didn't mean to say "don't override `-Xms` and `-Xmx`"
Pascal Thivent
+3  A: 

I have a feeling that you are spending time and effort trying to optimize something that you should not bother with.

I am spending well over 5% of my program time frozen for minor GCs, and this seems excessive.

Flip that around. You are spending just under 95% of your program time doing useful work. Or put it another way, even if you managed to optimize the GC to run in ZERO time, the best you can get is something over 5% improvement.

If your application has hard timing requirements that are impacted by the pause times, you could consider using a low-pause collector. (Be aware that reducing pause times increases the overall GC overheads ...) However for a batch job, the GC pause times should not be relevant.

What probably matters most is the wall clock time for the overall batch job. And the (roughly) 95% of the time spent doing application specific stuff is where you are likely to get more pay-off for your profiling / targeted optimization efforts. For example, have you looked at batching the updates that you send to the database?

Stephen C
Spring batch already does batching for me. I know this isn't the end-all be-all to make the program 100% faster thing... but the time spent in GC is over 5%, maybe even 6% or 7%. The wall clock time could be better, and that would help me...
bwawok