views:

210

answers:

4

i profiled a jdbc/hibernate batch-importer. it takes a csv transforms it slighly and imports it to the database sitting at localhost.

to my surprise the operation was not I/O bound but rather cpu-bound.

according to jmx/jconsole as well as the netbeans profiler it looked like 60% of the cpu time was spent in the "old gen" garbage collector the rest is used for geometric conversions (this is reasonable) and hibernate session management.

other applications used about 5-10% according to jconsole so what are "typical" ratios for cpu/young GC/old GC for such batch insert tasks?

+1  A: 

Sixty percent is real high. This is often an indication of someone using a lot of temporary Strings, or something similar. The fact that it's happening in the "old generation" suggests it might be happening in the database end, maybe while waiting for database transactions to happen. But that's just a horseback guess.

You probably want to make more detailed profiles of the run.

Charlie Martin
+1  A: 

In addition to what Charlie has said, another thing I think might cause this is if you have a lot of objects with finalizers (some libraries naughtily do)-- as I recall, this in effect forces them to bypass the VM's fast path for object deallocation.

Neil Coffey
+1: Good thought.
Software Monkey
finalizers are bad! but luckily this is not the issue.
Andreas Petersson
+1  A: 

afer taking a second look at the netbeans profiler/heap walker it became clear that there are tons of String instances containing full sql statements. this was caused by log4jdbc. so Charlie Martins guess was partially correct.

log4jdbc was not configured to log to any appender, but its log level was still set to INFO. although the log file did not contain any sql information, it was still rendered in background.

the performance boost from not having log4jdbc was MASSIVE.

the database cpu% utilisation went up from 1-2% to 20-50% (one core fully utilized) previously 5000 entries were inserted in batch mode, which took about 100 SECONDS without logging once chunk of 5000 entries is now inserted in about 1-2 SECONDS.

GC now takes about 6-7% of the total cpu time, as it should be.

so my conclusio:

Having a GC time of >20% is a clear indication that something is wrong.

Andreas Petersson
what was interesting that the allocation of those strings was very cheap and did not show up in the profiler..
Andreas Petersson
Thanks Andreas, that's interesting. So log4jdbc sounds like another of those silly frameworks that doesn't add much functionality and just ends up screwing you around... Incidentally, 6-7% CPU used on GV is STILL HIGH, though!
Neil Coffey
i should note that for debugging purposes log4jdbc is great! just be really sure you turn it off when deploying + at batch jobs.
Andreas Petersson
@Andreas: that's normal for GC memory - allocation is incredibly cheap as it just grabs a block at the end of the heap; the cost of this cheapness comes when it collects, it not only has to calculate and free the now-unused blocks, but also compact the heap to allow more allocations at the end without fragmentation occurring.
gbjbaanb
A: 

I agree that up to 10% is usually fine for GC time. In case you have an old gen problem, try to add the -XX:NewSize=300m parameter, where you can increase the young generation. This helps to avoid big heaps of used objects (garbage) in the old area. Especially when you just have a lot of local objects and you do not retain anything on purpose for longer.

ReneS