views:

258

answers:

6

Consider this simple Java class:

class MyClass {
  public void bar(MyClass c) {
    c.foo();
  }
}

I want to discuss what happens on the line c.foo().

Original, Misleading Question

Note: Not all of this actually happens with each individual invokevirtual opcode. Hint: If you want to understand Java method invocation, don't read just the documentation for invokevirtual!

At the bytecode level, the meat of c.foo() will be the invokevirtual opcode, and, according to the documentation for invokevirtual, more or less the following will happen:

  1. Look up the foo method defined in compile-time class MyClass. (This involves first resolving MyClass.)
  2. Do some checks, including: Verify that c is not an initialization method, and verify that calling MyClass.foo wouldn't violate any protected modifiers.
  3. Figure out which method to actually call. In particular, look up c's runtime type. If that type has foo(), call that method and return. If not, look up c's runtime type's superclass; if that type has foo, call that method and return. If not, look up c's runtime type's superclass's superclass; if that type has foo, call that method and return. Etc.. If no suitable method can be found, then error.

Step #3 alone seems adequate for figuring out which method to call and verifying that said method has the correct argument/return types. So my question is why step #1 gets performed in the first place. Possible answers seem to be:

  • You don't have enough information to perform step #3 until step #1 is complete. (This seems implausible at first glance, so please explain.)
  • The linking or access modifier checks done in #1 and #2 are essential to prevent certain bad things from happening, and those checks must be performed based on the compile-time type, rather than the run-time type hierarchy. (Please explain.)

Revised Question

The core of the javac compiler output for the line c.foo() will be an instruction like this:

invokevirtual i

where i is an index to MyClass' runtime constant pool. That constant pool entry will be of type CONSTANT_Methodref_info, and will indicate (maybe indirectly) A) the name of the method called (i.e. foo), B) the method signature, and C) the name of compile time class that the method is called on (i.e. MyClass).

The question is, why is the reference to the compile-time type (MyClass) needed? Since invokevirtual is going to do dynamic dispatch on the runtime type of c, isn't it redundant to store the reference to the compile-time class?

A: 

Java is a statically typed language. #3 alone is a reasonable description of a dynamically typed language.

And the JVM (natively having a similar orientation) repeats those checks during bytecode verification.

John M
Statically typed only implies that #1 and #2 should be done by the compiler. The question is asking why the runtime system must also implement that check.
Michael E
+1  A: 

That's not the way I understand it after reading the documentation. I think you have steps 2 and 3 transposed, which would make the whole series of events more logical.

Rob Heiser
Suppose I do have steps 2 and 3 transposed. (It's plausible. In the docs that I referred to, the sentence, "The named method is resolved" seems ambiguous.) Do you still agree that the JVM is doing some kind of checks against the compile-time type, or do you suspect I have that wrong too? (In particular, are all checks against the run-time type?) I'm still pretty confident the JVM does know that MyClass is the compile-time type associated with the call to foo, even if I'm fuzzy on what it does with that information.
Chris
Upon further reading :) 1) The index calculated from invokevirtual's operands is used to look into the runtime constant pool of MyClass, which will point to a symbolic reference to the method. Something like: MyClass/foo()V. 2) From that symbolic reference, the class "MyClass" is looked up, and the method "void foo()" is looked up in that class and checked for access protection. 3) The runtime type of variable "c" is checked for a method "void foo()", if not it recurses up the class hierarchy until it finds one. Maybe it does the first step so that it can fail fast. Michael E might be right ;)
Rob Heiser
BTW, thanks for asking this question -- very educational.
Rob Heiser
+1  A: 

Presumably, #1 and #2 have already happened by the compiler. I suspect that at least part of the purpose is to make sure that the they still hold with the version of the class in the runtime environment, which may be different from the version the code was compiled against.

I haven't digested the invokevirtual documentation to verify your summary, though, so Rob Heiser could be right.

Michael E
+1  A: 

I'm guessing answer "B".

The linking or access modifier checks done in #1 and #2 are essential to prevent certain bad things from happening, and those checks must be performed based on the compile-time type, rather than the run-time type hierarchy. (Please explain.)

#1 is described by 5.4.3.3 Method Resolution, which makes some important checks. For example, #1 checks the accessibility of the method in the compile-time type and may return an IllegalAccessError if it is not:

...Otherwise, if the referenced method is not accessible (§5.4.4) to D, method resolution throws an IllegalAccessError. ...

If you only checked the run-time type (via #3), then the run-time type could illegally widen the accessibility of the overridden method (a.k.a. a "bad thing"). Its true that the compiler should prevent such a case, but the JVM is nevertheless protecting itself from rogue code (e.g. manually-constructed malevolent code).

Bert F
+1  A: 

It is all about performance. When by figuring out the compile-time type (aka: static type) the JVM can compute the index of the invoked method in the virtual function table of the runtime type (aka: dynamic type). Using this index step 3 simply becomes an access into an array which can be accomplished in constant time. No looping is needed.

Example:

class A {
   void foo() { }
   void bar() { }
}

class B extends A {
  void foo() { } // Overrides A.foo()
}

By default, A extends Object which defines these methods (final methods omitted as they are invoked via invokespecial):

class Object {
  public int hashCode() { ... }
  public boolean equals(Object o) { ... }
  public String toString() { ... }
  protected void finalize() { ... }
  protected Object clone() { ... }
}

Now, consider this invocation:

A x = ...;
x.foo();

By figuring out that x's static type is A the JVM can also figure out the list of methods that are available at this call site: hashCode, equals, toString, finalize, clone, foo, bar. In this list, foo is the 6th entry (hashCode is 1st, equals is 2nd, etc.). This calculation of the index is performed once - when the JVM loads the classfile.

After that, whenever the JVM processes x.foo() is just needs to access the 6th entry in the list of methods that x offers, equivalent to x.getClass().getMethods[5], (which points at A.foo() if x's dynamic type is A) and invoke that method. No need to exhaustively search this array of methods.

Note that the method's index, remains the same regardless of the dynamic type of x. That is: even if x points to an instance of B, the 6th methods is still foo (although this time it will point at B.foo()).

Update

[In light of your update]: You're right. In order to perform a virtual method dispatch all the JVM needs is the name+signature of the method (or the offset within the vtable). However, the JVM does not execute things blindly. It first checks that the cassfiles loaded into it are correct in a process called verification (see also here).

Verification expresses one of the design principles of the JVM: It does not rely on the compiler to produce correct code. It checks the code itself before it allows it to be executed. In particular, the verifier checks that every invoked virtual method is actually defined by the static type of the receiver object. Obviously, the static type of the receiver is needed to perform such a check.

Itay
A: 

To totally understand this stuff, you need to understand how method resolution works in Java. If you're looking for an in-depth explanation, I suggest looking at the book, "Inside the Java Virtual Machine". The following sections from Chapter 8, "The Linking Model", are available online and seem particularly relevant:

(CONSTANT_Methodref_info entries are entries in the class file header that describe the methods called by that class.)

Thanks to Itay for inspiring me to do the Googling required to find this.

Chris