views:

274

answers:

8

I have some data processing code which uses the following recipe:

  • Read in as much data as will fit in memory (call this a 'chunk')
  • Perform processing on the chunk
  • Write out processed chunk to disk
  • Repeat
  • ...
  • Merge all the processed chunks to get the final answer.

This last stage is most efficient when there are as few chunks as possible, so I want the first stage to read in as much data as will fit in memory. I can do this by querying Runtime.freeMemory().

However, this means I need to call System.gc(), or the number returned by Runtime.freeMemory() is much smaller than the amount of memory I could safely allocate.

I have heard a number of authorities say that calling System.gc() explicitly is a bad idea. Is there any way I can avoid this?

A: 

Cache the first value of freememory, reuse it and let the VM do the work.

Marco van de Voort
Surely this will only help if I have a reliable way of measuring how much memory I am allocating as I go along.
Simon Nickerson
Even then it won't work, since the VM might have yielded pages which are subsequently allocated by other processes. But rule of thumb in such cases is that the first approx is at least predictable.
Marco van de Voort
A: 

Very nice timing. I asked this earlier today and got some useful answers, hope it helps.

EDIT: this doesn't really answer your question but it refers to the calling the System.gc() not being a good idea.

Savvas Dalkitsis
A: 

use JConsole or something like that

dfa
Use JConsole to do what, sorry? I'm having to do this programmatically.
Simon Nickerson
so study JConsole sources...
dfa
+2  A: 

Even if you call System.gc() right before checking how much memory you have, there is no guarantee that a garbage collection will actually have occurred. Myself I really wouldn't bother, I'd set a fixed chunk size (preferably configured through a property or similar) and always use that. If the rest of your program is simple enough, you can just use the chunk size plus a fixed number of megs as the heap size. If the size of your program is too uncertain due to other reasons, you could look into running two programs side by side and using an IPC mechanism.

Of course it could well be that your code needs more fine grained control over memory, but I'd humbly suggest you're using the wrong language then; or at least the wrong runtime (there's RT java offerings out there, I assume they're more geared toward this sort of thing).

I'm sorry if this doesn't seem like the most useful answer, but basically I'm wondering whether you really need this?

wds
A: 

The reason calling System.gc() is a bad idea is most likely because it does not guarantee anything.

If you really want to be certain that the JVM does garbage collection, you must tell it to. One way is the same way as JConsole, namely through JMX.

See http://java.sun.com/j2se/1.5.0/docs/guide/management/agent.html#local

Thorbjørn Ravn Andersen
+1  A: 

The reason calling System.gc() is a bad idea is most likely because it does not guarantee anything.

The real reason that calling System.gc() is a bad idea is that the JVM is best at knowing the optimal time to run the GC; i.e. when the heap is full. If you call System.gc() at some other time you are telling the JVM to do something expensive and wasteful.

Back to the original question, I think that the best solution is to not try to code the application to second guess the memory allocator. Instead, code the application so that the chunk size is command line parameter / system property / whatever, and manually tune the chunk size versus the JVM memory size. You probably also want to ensure that the JVM initial and max memory sizes are the same.

Stephen C
A: 

The JVM Tool INterface (jvmti) has a method named ForceGarbageCollection. You could write some JNI to call it.

something like

#include "jvmti.h"
#include "jni.h"

jvmtiEnv *jvmti;

JNIEXPORT jint JNICALL
Agent_OnLoad(JavaVM *vm, char *options, void *reserved) {
    (*vm)->GetEnv(vm, (void **)&jvmti, JVMTI_VERSION_1);
    return JNI_OK;
}

JNIEXPORT void JNICALL my_managled_function_name_that_is_entirely_too_long_to_be_easy_to_use (JNIEnv *env) {
    error = (*jvmti)->ForceGarbageCollection(jvmti);
//    you can trap the error if you want;
}

BTW, this is a bad idea. I only use this code for debugging (to ensure that certain class, like listeners, have no more reachable references.)

My bet is that the VM will gc all possible data before throwing Memory Errors.

KitsuneYMG
A: 

what about using JMX? In particular the MemoryMXbean:

MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();

check also the MemoryUsage class.

dfa