ansaurus

Question

What is the most memory efficient way to write from a database to a (zip) file in Java?

Answer 1

A:

Since it's Java, the memory should only spike temporarily, unless you are leaking references, like if you push things onto a list that is a member of a singleton that has life span of the entire program, or in my experience more likely is resource leaking, which happens when (and this I'm assuming applies to Java although I'm thinking of C#) objects that use unmanaged resources like file handles never call their cleanup code, a condition commonly caused by empty exception handlers that do not re-throw to the parent stack frame, which has the net effect of circumventing the finally block...

Gabriel 2010-09-14 07:24:14

It does spike. I'll edit the OP with the screenshot of memory usage profiling.

Andrija 2010-09-14 07:32:01

this is a guess, but maybe the java zipping class needs to have the whole source in memory to produce it's output which gets around the savings of the streamwriter?

Gabriel 2010-09-14 07:36:32

Could be, that's why I asked here :)

Andrija 2010-09-14 09:24:00

Answer 2

+3 A:

Is it possible to use ResultSet.TYPE_FORWARD_ONLY?

You have used ResultSet.TYPE_SCROLL_INSENSITIVE. I believe for some databases (you didn't say which one you use) this causes the whole result set to be loaded in memory.

Thomas Mueller 2010-09-14 07:40:32

We're using Oracle 10.2.0.4. and I've used this and I got results! I'm still suspicious about them (meaning I'll do more testing/profiling) but for now it really looks promising. The new memory usage is included in the OP.

Andrija 2010-09-14 07:53:01

Yes, look at http://download.oracle.com/docs/cd/B19306_01/java.102/b14355/resltset.htm#CIHCHBJB. In general, using anything other than ResultSet.TYPE_FORWARD_ONLY is a bad idea.

gpeche 2010-09-14 07:58:44

Answer 3

A:

I've ran some more tests and the conclusions are:

The biggest gain is in JVM (or visualvm has problems monitoring Java 5 Heap space:). When I first reported that ResultSet.TYPE_FORWARD_ONLY got me a significant gain, I was wrong. The biggest gain was by using Java 5 under which the same program used up to 50MB of heapspace, as opposed to Java 6 under which the same code took up to 150 MB.
Second gain is in ResultSet.TYPE_FORWARD_ONLY which made the program take as small amount of memory as possible.
Third gain is in Sting.intern() which made the program take a bit less memory since it caches strings instead of creating new ones.

This is the usage with the optimizations 2 and 3 (if there wasn't String.intern() the graph would be the same, you should only add 5 MB more to every point)

alt text

and this is the usage without them (the lesser usage at the end is due to the program going out of memory :) ) alt text

Thank you everyone for your assistance.

Andrija 2010-09-15 08:26:22

ansaurus

tags:

views:

answers:

What is the most memory efficient way to write from a database to a (zip) file in Java?

related questions