views:

667

answers:

7

I have this class and I'm testing insertions with different data distributions. I'm doing this in my code:

...

AVLTree tree = new AVLTree();

//insert the data from the first distribution

//get results

...

tree = new AVLTree();

//inser the data from the next distribution

//get results

...

I'm doing this for 3 distributions. Each one should be tested an average of 14 times, and the 2 lowest/highest values removed from to compute the average. This should be done 2000 times, each time for 1000 elements. In other words, it goes 1000, 2000, 3000, ..., 2000000.

The problem is, I can only get as far as 100000. When I tried 200000, I ran out of heap space. I increased the available heap space with -Xmx in the command line to 1024m and it didn't even complete the tests with 200000. I tried 2048m and again, it wouldn't work.

What I'm thinking is that the garbage collector isn't getting rid of the old trees once I do tree = new AVL Tree(). But why? I thought that the elements from the old trees would no longer be accessible and their memory would be cleaned up.

A: 

The Java garbage collector isn't guaranteed to garbage collect after each object's refcount becomes zero. So if you're writing code that is only creating and deleting a lot of objects, it's possible to expend all of the heap space before the gc has a chance to run. Alternatively, Pax's suggestion that there is a memory leak in your code is also a strong possibility.

If you are only doing benchmarking, then you may want to use the java gc function (in the System class I think) between tests, or even re-run you program for each distribution.

Dana the Sane
My understanding is that you're correct - it won't GC when the refcount becomes zero. But it *will* GC if it runs out of heap. If that GC doesn't free up enough, *then* you will get the error.
paxdiablo
A GC is triggered by an event. Memory allocations that cannot be satisfied is such an event... so: No memory free? Clean what we have and check again... if the memory is still not free (enough), clean again... if nothing can be cleaned anymore throw an OOM exception.
ReneS
"refcount" doesn't really make sense in Java. Reference counting is not a sensible way of implementing GC in a multithreaded system.
Tom Hawtin - tackline
+5  A: 

The garbage collector should have no trouble cleaning up your old tree objects, so I can only assume there's some other allocation that you're doing that's not being cleaned up.

paxdiablo
Is deleting the AVL tree a strong enough hint to the jre to cleanup, or would it help to traverse the tree an null all of the nodes?
Dana the Sane
The Java GC can handle that, you don't need to clean up the nodes.
paxdiablo
Definitely do not assign null to all of the nodes. Java garbage collectors work extremely well. Explicitly nulling references doesn't help.
erickson
The Java GC has a terrible time when dealing with lots (e.g. millions) of small objects. In certain situations, the GC can't keep up, even when calling System.gc() explicitly.
Rick C. Petty
@Rick, that shouldn't cause an out-of-heap error, just a "stop-the-world" GC to clean up the remaining nodes. There's something else going on in the app that's making objects *not* being made available for GC (IMO, given the limited info).
paxdiablo
We were seeing heap errors, which was what led us to investigate the problem. I agree that this is not how it should happen. I'm just describing our experience.
Rick C. Petty
A: 

We noticed this in a server product. When making a lot of tiny objects that quickly get thrown away, the garbage collector can't keep up. The problem is more pronounced when the tiny objects have pointers to larger objects (e.g. an object that points to a large char[]). The GC doesn't seem to realize that if it frees up the tiny object, it can then free the larger object. Even when calling System.gc() directly, this was still a huge problem (both in 1.5 and 1.6 VMs)!

What we ended up doing and what I recommend to you is to maintain a pool of objects. When your object is no longer needed, throw it into the pool. When you need a new object, grab one from the pool or allocate a new one if the pool is empty. This will also save a small amount of time over pure allocation because Java doesn't have to clear (bzero) the object.

If you're worried about the pool getting too large (and thus wasting memory), you can either remove an arbitrary number of objects from the pool on a regular basis, or use weak references (for example, using java.util.WeakHashMap). One of the advantages of using a pool is that you can track the allocation frequency and totals, and you can adjust things accordingly.

We're using pools of char[] and byte[], and we maintain separate "bins" of sizes in the pool (for example, we always allocate arrays of size that are powers of two). Our product does a lot of string building, and using pools showed significant performance improvements.

Note: In general, the GC does a fine job. We just noticed that with small objects that point to larger structures, the GC doesn't seem to clean up the objects fast enough especially when the VM is under CPU load. Also, System.gc() is just a hint to help schedule the finalizer thread to do more work. Calling it too frequently causes a significant performance hit.

Rick C. Petty
Pooling objects to spare yourself allocation/de-allocation is usually a net loss. It's only worthwhile when creating resources with extra overhead, such as disk or network I/O. By the same token, calling System.gc explicitly is a fast way to ruin performance. Tune the garbage collector. It works.
erickson
How do you justify your statements? Allocating large buffers takes significantly longer than the time to pull that buffer out of a pool. I would be happy to post our performance numbers. I would also love to hear your recommendations on GC tuning.
Rick C. Petty
Your problem there seem to stem from large objects not being able to be allocated in the young (eden and survivor) spaces. GC operating in the young space will not recover memory from the tenured generation.
Tom Hawtin - tackline
+5  A: 

Java has a good tool to watch the GC in progress (or not in your case), JVisualVM, which comes with the JDK.

Just run that and it will show you which objects are taking up the heap, and you can both trigger and see the progress of GC's. Then you can target those for pools so they can be re-used by you, saving the GC the work.

Also look into this option, which will probably stop the error you're getting that stops the program, and you program will finish, but it may take a long time because your app will fill up the heap then run very slowly.

-XX:-UseGCOverheadLimit

justinhj
+1  A: 

Which JVM you are using and what JVM parameters you have used to configure GC?

Your explaination shows there is a memory leak in your code. If you have any tool like jprofiler then use it to find out where is the memory leak.

Bhushan
+1  A: 

There's no reason those trees shouldn't be collected, although I'd expect that before you ran out of memory you should see long pauses as the system ran a full GC. As it's been noted here that that's not what you're seeing, you could try running with flags like -XX:-PrintGC, -XX:-PrintGCDetails,-XX:-PrintGCTimeStamps to give you some more information on exactly what's going on, along with perhaps some sort of running count of roughly where you are. You could also explicitly tell the garbage collector to use a different garbage-collection algorithm.

However, it still seems unlikely to me. What other code is running? is it possible there's something in the AVLTree class itself that's keeping its instances from being GC'd? What about manually logging the finalize() on that class to insure that (some of them, at least) are collectible (e.g. make a few and manually call System.gc())?

GC params here, a nice ref on garbage collection from sun here that's well worth reading.

Steve B.
A: 

Given that you're just doing this for testing purposes, it might just be good housekeeping to invoke the garbage collector directly using System.gc() (thus forcing it to make a pass). It won't help you if there is a memory leak, but if there isn't, it might buy you back enough memory to get through your test.

Ian McLaird