views:

174

answers:

2

I have been reading these slides about Java finalizers. In it, the author describes a scenario (on slide 33) whereby CleanResource.finalize() could be run by the finalizer thread while CleanResource.doSomething() is still running on another thread. How could this happen?

If doSomething() is a non-static method, then to execute that method someone, somewhere must have a strong reference to it... right? So how could this reference get cleared out before the method returns? Can another thread swoop in and null out that reference? If that happened, would doSomething() still return normally on the original thread?

That's all I really want to know, but for a really above-and-beyond answer, you can tell me why the doSomething() on slide 38 is better than the doSomething() on slide 29. Why is it sufficient to simply invoke this keepAlive() method? Wouldn't you need to wrap the whole call to myImpl.doSomething() in a synchronized(this){} block?

A: 

I agree. The slides seem to be suggesting an optimization that would result in objects being used despite being treated by the garbage collector as unreachable. That has to be an invalid optimization.

Consider that the JLS 12.6.1 defines reachable as follows:

"A reachable object is any object that can be accessed in any potential continuing computation from any live thread."

Any object which has an instance method still executing is likely to be accessed during the execution of that method. So it is reachable.

Besides, I think he's wrong at an implementation level as well. Even if the location or register where the caller got the object reference might no longer contain it, the reference will also be in a register / stack location in the context of called method. That is sufficient to make it reachable.

Perhaps the original author doesn't realize that the GC will also treat all references in registers or saved register sets as "roots".

Stephen C
the article states that register allocation is used rather than stack allocation. The method call is being optimized so that `this` is held in a register and used as a base to compute offsets for fields. `this` does not have to be on the stack - it's just a pointer.
mdma
@mdma - the object is reachable by the JLS definition. If the GC didn't realize that, that would be a GC bug.
Stephen C
But if no following bytecode in that method accesses the `this` reference, then it can not be accessed in any continuation, no?
Jörn Horstmann
remember that the compiler/cpu can reorder execution. In code that reads a reference, calls a method then clears the reference, the clearing can be done before the method call. The `this` pointer is maintained in a register only for as long as is needed (e.g. to access MyIndex.) When the register is cleared, the VM is free to claim the object. It is unreachable, by the JLS spec, with the consequence that the finalier and a regular method may execute concurrently.
mdma
No. The fact that the regular method on the object *is executing* means that the object is reachable, according to the definition quoted above.
Stephen C
Simply not. If the method does not access any of the objects members in future, then the method has no influence on the lifetime of the object.
mdma
+3  A: 

EDIT3:

The upshot is that the finalizer and a regular method can be executed concurrently on the same instance. Here's an explanation of how that can happen. The code is essentially:

class CleanResource {
   int myIndex;
   static ArrayList<ResourceImpl> all;

   void doSomething() {
     ResourceImpl impl = all.get(myIndex);
     impl.doSomething();
   } 

   protected void finalize() { ... }
}

Given this client code:

CleanResource resource = new CleanResource(...);
resource.doSomething();
resource = null; 

This might be JITed to something like this pseudo C

register CleanResource* res = ...; call ctor etc..
// inline CleanResource.doSomething()
register int myIndex = res->MyIndex;
ResourceImpl* impl = all->get(myInddex);
impl->DoSomething();
// end of inline CleanResource.doSomething()
res = null;

Executed like that, res is cleared after the inlined CleanResource.doSomething() is done, so the gc will not happen until after that method has finished executing. There is no possibility of finalize executing concurrently with another instance method on the same instance.

But, the write to res is not used after that point, and given that there are no fences, it can be moved earlier in the execution, to immediately after the write:

register CleanResource* res = ...; call ctor etc..
// inline CleanResource->doSomething()
register int myIndex = res->MyIndex;
res = null;    /// <-----
ResourceImpl* impl = all->get(myInddex);
impl.DoSomething();
// end of inline CleanResource.doSomething()

At the marked location (<---), there are no references to the CleanResource instance, and so it is eligible for collection and the finalizer method called. Since the finalizer can be called any time after the last reference is cleared, it is possible for the finalizer and the remainder of the CleanResource.doSomething() to execute in parallel.

EDIT2: The keepAlive() ensures that the this pointer is accessed at the end of the method, so that the compiler cannot optimize away use of the pointer. And that this access is guaranteed to happen in the order specified (the synchronized word marks a fence that disallows re-ordering of reads and writes before/after that point.)

Original Post:

The example is saying that the doSomething method is called, and once called, the data referenced via the this pointer can be read early (myIndex in the example). Once the referenced data is read, the this pointer is no longer needed in that method, and the cpu/compiler might overwrite the registers/declare the object as no longer reachable. So, the GC could then concurrently call the finalizer at the same time as the object's doSomething() method is running.

But since the this pointer is not used, it's hard to see how this will have any tangible effect.

EDIT: Well, perhaps if there are cached pointers to the object's fields that are being accessed via cache, computed from this before it was reclaimed, and the object is then reclaimed the memory references become invalid. There's a part of me that has a hard time believing this is possible, but then again, this does seem to be a tricky corner case, and I don't think there is anything in JSR-133 to prevent this happening by default. It's a question of whether an object is considered to be referenced only by pointers to its base or by pointers to it's fields as well.

mdma
So could this happen on a stack-based VM as well? Now that I'm thinking at bytecode level like you are I'm understanding this much better. I guess in the case of a stack machine the reference doesn't need to be on the stack for `invokevirtual` to complete? So once the `myIndex` value is retrieved then `this` can be popped off the stack and potentially reclaimed?
Neil Traft
On a strict stack-based implementation, then this is not possible, since the this pointer will remain on the stack. But with method inlining, out of order execution and register allocation, concurrent call to the finalizer becomes possible. See my latest edit in this continuing saga. :)
mdma
Great answer! I didn't even know anyone other than Android's Dalvik VM had implemented a register-based JVM. Is this common?
Neil Traft
As far as I know, the JIT is not confined to stack based conventions, e.g. method inlining is very common, as has become more aggressive in recent versions, since it allows the JIT to perform more optimizations with larger code blocks. See http://java.sun.com/products/hotspot/whitepaper.html#method
mdma