views:

177

answers:

3

Hello everyone,

I am writing a Bytecode instrumenter. Right now, I am trying to find out how to do that in the presence of objects. I would like some clarifications on two lines I read in the JVMS (section 4.9.4):

1) "The verifier rejects code that uses the new object before it has been initialized."

My question is, what does "uses" mean here? I'm guessing that it means: passing it as a method attribute, calling GETFIELD and PUTFIELD on it, or calling any instance method on it. Are their other forbidden uses? And I believe that it follows that other instructions such as DUP, LOAD and STORE are allowed.

2) "Before that method invokes another instance initialization method of myClass or its direct superclass on this, the only operation the method can perform on this is assigning fields declared within myClass."

Which means that in an <init> method, GETFIELD and PUTFIELD are allowed before another <init> is called. However, in Java, doing any operation on an instance field before a call to super() or this() results in a compilation error. Could someone clarify this?

3) I have one more question. When does an object reference becomes initialized, and hence, ready to be freely used? From reading the JVMS, I came up with the answer that whether an object is initialized or not, is up to each method. At a certain point in time, the object can be initialized for a method but not for the other. Specifically, an object becomes initialized for a method when <init> called by that method returns.

For example, consider that the main() method created an object and called <init> which then called the superclass's <init>. After returning from super(), the object is now considered initialized by <init> , but is not yet initialized for main(). Does this mean, in <init> after super(), I can pass the object as a parameter to a method, even before returning from to main().

Could someone confirm that this whole analysis is true? Thank you for your time.

ps: I have actually posted the same question to the Sun forums but with on response. I hope I'll have more luck here. Thank you.

Update

First thank you for your answers and time. Although I didn't get a clear-cut answer (I had many questions and some of them were a bit vague), your answers and examples, and the subsequent experiments, were extremely useful for me in understanding more deeply how the JVM works.

The main thing I discovered is that the Verifier's behavior differ with different implementations and versions (which makes the job of bytecode manipulation much more complicated). The problem lies in either a non-conformity to the JVMS, or a lack of documentation from the verifier's developers, or the JVMS has some subtle vagueness in the verifier's area.

One last thing, SO Rocks!!! I posted the same question in the official Sun JVM Specifications forum, and I still got no answer till now.

+1  A: 

I suggest that you download a copy of the OpenJDK sources and look at what the verifier is actually checking. If nothing else, that may help you understand what the JMV specification is saying.

(However, @Joachim is right. Relying on what the verifier implementation does rather than what the specification says is rather risky.)

Stephen C
It's not a bad idea, but you should be careful with this. In the past the actual verifier used to be less strict about the rules than the specification. It's entirely possible that such cases still occur.
Joachim Sauer
Yes, I discovered that different implementations of the Verifier have different level of strictness. For example, Apache BCEL's JustIce verifier doesn't allow STORE on uninitialized object references, while Java HotSpot 10.0 does
HH
+3  A: 

"The verifier rejects code that uses the new object before it has been initialized."

In bytecode verification, since the verifier works at link-time, the types of local variables of methods are inferred. The types of method arguments are known as they are in the method signature in the class file. The types of other local variables are not known and are inferred, so I assume the "uses" in the above statement relates to this.

EDIT: The section 4.9.4 of the JVMS reads:

The instance initialization method (§3.9) for class myClass sees the new uninitialized object as its this argument in local variable 0. Before that method invokes another instance initialization method of myClass or its direct superclass on this, the only operation the method can perform on this is assigning fields declared within myClass.

This assignment of fields in the above statement is the "initial" initialization of the instance variables to default initial values (like int is 0, float is 0.0f etc.) when the memory for the object is allocated. There is one more "proper" initialization of instance variables when the virtual machine invokes the instance initialization method(constructor) on the object.


The link provided by John Horstmann helped clarify things. So these statements dont hold true. "This DOESNOT mean that in an <init> method, getfield and putfield are allowed before another <init> is called." The getfield and putfield instructions are used to access (and change) the instance variables(fields) of a class (or instance of a class). And this can happen only when the instance variables(fields) are initialized."

From the JVMS :

Each instance initialization method (§3.9), except for the instance initialization method derived from the constructor of class Object, must call either another instance initialization method of this or an instance initialization method of its direct superclass super before its instance members are accessed. However, instance fields of this that are declared in the current class may be assigned before calling any instance initialization method.

When the Java Virtual Machine creates a new instance of a class, either implicitly or explicitly, it first allocates memory on the heap to hold the object's instance variables. Memory is allocated for all variables declared in the object's class and in all its superclasses, including instance variables that are hidden. As soon as the virtual machine has set aside the heap memory for a new object, it immediately initializes the instance variables to default initial values. Once the virtual machine has allocated memory for the new object and initialized the instance variables to default values, it is ready to give the instance variables their proper initial values. The Java Virtual Machine uses two techniques to do this, depending upon whether the object is being created because of a clone() invocation. If the object is being created because of a clone(), the virtual machine copies the values of the instance variables of the object being cloned into the new object. Otherwise, the virtual machine invokes an instance initialization method on the object. The instance initialization method initializes the object's instance variables to their proper initial values. And only after this can you use getfield and putfield.

The java compiler generates atleast one instance initialization method(constructor) for every class it compiles. If the class declares no constructors explicitly, the compiler generated a default no-arg constructor that just invokes the superclass no-arg constructor. And rightly so doing any operation on an instance field before a call to super() or this() results in a compilation error.

An <init> method can contain three kinds of code: an invocation of another <init> method, code that implements any instance variable initializers, and code for the body of the constructor. If a constructor begins with an explicit invocation of another constructor in the same class (a this() invocation) its corresponding <init> method will be composed of two parts:

  • an invocation of the same-class <init> method
  • the bytecodes that implement the body of the corresponding constructor

If a constructor does not begin with a this() invocation and the class is not Object, the <init> method will have three components:

  • an invocation of a superclass <init> method
  • the bytecodes for any instance variable initializers
  • the bytecodes that implement the body of the corresponding constructor


If a constructor does not begin with a this() invocation and the class is Object(and Object has no superclass), then its <init> method cant begin with a superclass <init> method invocation. If a constructor begins with an explicit invocation of a superclass constructor ( a super() invocation), its <init> method will invoke the corresponding superclass <init>method.



I think this answers your first and second question.

Updated:

For example,

  class Demo
  {
     int somint;

     Demo() //first constructor
     {
      this(5);
      //some other stuff..
     }

     Demo(int i) //second constructor
     {
      this.somint = i;
      //some other stuff......
     }
     Demo(int i, int j) //third constructor
     {
      super();
      //other stuffff......
     }
  }

Heres the bytecode for the above three constructors from the compiler(javac):

Demo();
  Code:
   Stack=2, Locals=1, Args_size=1
   0:   aload_0
   1:   iconst_5
   2:   invokespecial   #1; //Method "<init>":(I)V
   5:   return

Demo(int);
  Code:
   Stack=2, Locals=2, Args_size=2
   0:   aload_0
   1:   invokespecial   #2; //Method java/lang/Object."<init>":()V
   4:   aload_0
   5:   iload_1
   6:   putfield        #3; //Field somint:I
   9:   return

Demo(int, int);
  Code:
   Stack=1, Locals=3, Args_size=3
   0:   aload_0
   1:   invokespecial   #2; //Method java/lang/Object."<init>":()V
   4:   return

In the first constructor, the <init> method begins with calling the same-class <init> method and then executed the body of the corresponding constructor. Because the constructor begins with a this(), its corresponding <init> method doesnot contain bytecode for initializing the instance variables.

In the second constructor, the <init> method for the constructor has

  • super class <init> method, ie, invocation of the superclass constructor(no arg method), the compiler generated this by default because no explicit super() was found as the first statement.
  • the bytecode for initializing the instance variable someint.
  • bytecode for rest of the stuff in the constructor body.
Zaki
PS:I havent written a bytecode instrumenter myself but Ive been reading stuff on the JVM lately like Inside the JVM.
Zaki
First thank you for your time!!! I still have few questions: You said that a constructor with no this() has 3 components. However, I have read bytecode, and the compiler never adds bytecodes for instance variable initializers (0 for int, null for objects). It seems that the JVM initializes the instance fields without the need of bytecode that explicitly do that.
HH
@HH, see updated answer
Zaki
I don't think this is correct. The initialization of fields to their default values happens during execution of `new` opcode, before the actual constructor, and is not reflected in the bytecode of the constructor. The sentence from the jvms can then only refer to actual `putfield` opcodes, which therefore seem to be allowed.
Jörn Horstmann
@Jorn Horstmann, is this behavior same across justIce verifier and others?
Zaki
@Zaki, I believe this is a bug in the bcel verifier. Here http://scala-programming-language.1934581.n4.nabble.com/Object-initialization-order-in-Scala-td1987968.html is a discussion about bytecode generated by scala, which also initializes fields before calling the super constructor.Regarding default initializers, you could check this by creating a class containing an uninitialized int field and then check the bytecode with javap, there are no putfield instructions to set the field to 0. The super constructor does not know about this field, so the initialization has to be done by the jvm.
Jörn Horstmann
+2  A: 

Contrary to what the java language specifies, at the bytecode level it is possible to access fields of a class in a constructor before calling the superclass constructor. The following code uses the asm library to create such a class:

package asmconstructortest;

import java.io.FileOutputStream;
import org.objectweb.asm.*;
import org.objectweb.asm.util.CheckClassAdapter;
import static org.objectweb.asm.Opcodes.*;

public class Main {

    public static void main(String[] args) throws Exception {
        //ASMifierClassVisitor.main(new String[]{"/Temp/Source/asmconstructortest/build/classes/asmconstructortest/Test.class"});
        ClassWriter cw = new ClassWriter(0);
        CheckClassAdapter ca = new CheckClassAdapter(cw);

        ca.visit(V1_5, ACC_PUBLIC + ACC_SUPER, "asmconstructortest/Test2", null, "java/lang/Object", null);

        {
            FieldVisitor fv = ca.visitField(ACC_PUBLIC, "property", "I", null, null);
            fv.visitEnd();
        }

        {
            MethodVisitor mv = ca.visitMethod(ACC_PUBLIC, "<init>", "()V", null, null);
            mv.visitCode();
            mv.visitVarInsn(ALOAD, 0);
            mv.visitInsn(ICONST_1);
            mv.visitFieldInsn(PUTFIELD, "asmconstructortest/Test2", "property", "I");
            mv.visitVarInsn(ALOAD, 0);
            mv.visitMethodInsn(INVOKESPECIAL, "java/lang/Object", "<init>", "()V");
            mv.visitInsn(RETURN);
            mv.visitMaxs(2, 1);
            mv.visitEnd();
        }

        ca.visitEnd();

        FileOutputStream out = new FileOutputStream("/Temp/Source/asmconstructortest/build/classes/asmconstructortest/Test2.class");
        out.write(cw.toByteArray());
        out.close();
    }
}

Instantiation this class works fine, without any verification errors:

package asmconstructortest;

public class Main2 {
    public static void main(String[] args) {
        Test2 test2 = new Test2();
        System.out.println(test2.property);
    }
}
Jörn Horstmann
I've also tested this idea using jbe (java bytecode editor) and it passes on HotSpot 10 verifier (but not on Apache BCEL's JustIce)
HH