tags:

views:

1011

answers:

8

In a constructor in Java, if you want to call another constructor (or a super constructor), it has to be the first line in the constructor. I assume this is because you shouldn't be allowed to modify any instance variables before the other constructor runs. But why can't you have statements before the constructor delegation, in order to compute the complex value to the other function? I can't think of any good reason, and I have hit some real cases where I have written some ugly code to get around this limitation.

So I'm just wondering:

  1. Is there a good reason for this limitation?
  2. Are there any plans to allow this in future Java releases? (Or has Sun definitively said this is not going to happen?)


For an example of what I'm talking about, consider some code I wrote which I gave in this StackOverflow answer. In that code, I have a BigFraction class, which has a BigInteger numerator and a BigInteger denominator. The "canonical" constructor is the BigFraction(BigInteger numerator, BigInteger denominator) form. For all the other constructors, I just convert the input parameters to BigIntegers, and call the "canonical" constructor, because I don't want to duplicate all the work.

In some cases this is easy; for example, the constructor that takes two longs is trivial:

  public BigFraction(long numerator, long denominator)
  {
    this(BigInteger.valueOf(numerator), BigInteger.valueOf(denominator));
  }

But in other cases, it is more difficult. Consider the constructor which takes a BigDecimal:

  public BigFraction(BigDecimal d)
  {
    this(d.scale() < 0 ? d.unscaledValue().multiply(BigInteger.TEN.pow(-d.scale())) : d.unscaledValue(),
         d.scale() < 0 ? BigInteger.ONE                                             : BigInteger.TEN.pow(d.scale()));
  }

I find this pretty ugly, but it helps me avoid duplicating code. The following is what I'd like to do, but it is illegal in Java:

  public BigFraction(BigDecimal d)
  {
    BigInteger numerator = null;
    BigInteger denominator = null;
    if(d.scale() < 0)
    {
      numerator = d.unscaledValue().multiply(BigInteger.TEN.pow(-d.scale()));
      denominator = BigInteger.ONE;
    }
    else
    {
      numerator = d.unscaledValue();
      denominator = BigInteger.TEN.pow(d.scale());
    }
    this(numerator, denominator);
  }


Update

There have been good answers, but thus far, no answers have been provided that I'm completely satisfied with, but I don't care enough to start a bounty, so I'm answering my own question (mainly to get rid of that annoying "have you considered marking an accepted answer" message).

Workarounds that have been suggested are:

  1. Static factory.
    • I've used the class in a lot of places, so that code would break if I suddenly got rid of the public constructors and went with valueOf() functions.
    • It feels like a workaround to a limitation. I wouldn't get any other benefits of a factory because this cannot be subclassed and because common values are not being cached/interned.
  2. Private static "constructor helper" methods.
    • This leads to lots of code bloat.
    • The code gets ugly because in some cases I really need to compute both numerator and denominator at the same time, and I can't return multiple values unless I return a BigInteger[] or some kind of private inner class.

The main argument against this functionality is that the compiler would have to check that you didn't use any instance variables or methods before calling the superconstructor, because the object would be in an invalid state. I agree, but I think this would be an easier check than the one which makes sure all final instance variables are always initialized in every constructor, no matter what path through the code is taken. The other argument is that you simply can't execute code beforehand, but this is clearly false because the code to compute the parameters to the superconstructor is getting executed somewhere, so it must be allowed at a bytecode level.

Now, what I'd like to see, is some good reason why the compiler couldn't let me take this code:

public MyClass(String s) {
  this(Integer.parseInt(s));
}
public MyClass(int i) {
  this.i = i;
}

And rewrite it like this (the bytecode would be basically identical, I'd think):

public MyClass(String s) {
  int tmp = Integer.parseInt(s);
  this(tmp);
}
public MyClass(int i) {
  this.i = i;
}

The only real difference I see between those two examples is that the "tmp" variable's scope allows it to be accessed after calling this(tmp) in the second example. So maybe a special syntax (similar to static{} blocks for class initialization) would need to be introduced:

public MyClass(String s) {
  //"init{}" is a hypothetical syntax where there is no access to instance
  //variables/methods, and which must end with a call to another constructor
  //(using either "this(...)" or "super(...)")
  init {
    int tmp = Integer.parseInt(s);
    this(tmp);
  }
}
public MyClass(int i) {
  this.i = i;
}
A: 

My guess is that, until a constructor has been called for every level of the heierarchy, the object is in an invalid state. It is unsafe for the JVM to run anything on it until it has been completely constructed.

yeah, i'm guessing it's something like that, but if you are not acting on any instance variables before calling the other constructor, it shouldn't matter what the state of the object is
Kip
+5  A: 

The constructors must be called in order, from the root parent class to the most derived class. You can't execute any code beforehand in the derived constructor because before the parent constructor is called, the stack frame for the derived constructor hasn't even been allocated yet, because the derived constructor hasn't started executing. Admittedly, the syntax for Java doesn't make this fact clear.

Edit: To summarize, when a derived class constructor is "executing" before the this() call, the following points apply.

  1. Member variables can't be touched, because they are invalid before base classes are constructed.
  2. Arguments are read-only, because the stack frame has not been allocated.
  3. Local variables cannot be accessed, because the stack frame has not been allocated.

You can gain access to arguments and local variables if you allocated the constructors' stack frames in reverse order, from derived classes to base classes, but this would require all frames to be active at the same time, wasting memory for every object construction to allow for the rare case of code that wants to touch local variables before base classes are constructed.

Tmdean
In the second example I posted, on the byte-code level, it *must* be executing some code before calling the other constructor, right? I mean it executes two ?: statements, and puts the results of those statements into temporary variables somewhere.
Kip
Yes, but from the compiler's point of view, that code is executed by the code calling the constructor, not the constructor itself.
Tmdean
Still seems like Java compiler would be quite capable of making sure that you don't violate rules 1/2/3 before delegating to another constructor. Can't be much harder than the checks done in a static{} block, or checking that final members are always assigned exactly once in each constructor.
Kip
+5  A: 

I find this pretty ugly, but it helps me avoid duplicating code. The following is what I'd like to do, but it is illegal in Java ...

You could also work around this limitation by using a static factory method that returns a new object:

public static BigFraction valueOf(BigDecimal d)
{
    // computate numerator and denominator from d

    return new BigFraction(numerator, denominator);
}

Alternatively, you could cheat by calling a private static method to do the computations for your constructor:

public BigFraction(BigDecimal d)
{
    this(computeNumerator(d), computeDenominator(d));
}

private static BigInteger computeNumerator(BigDecimal d) { ... }
private static BigInteger computeDenominator(BigDecimal d) { ... }
Zach Scrivena
+1 I was going to suggest the private static methods myself.
Michael Myers
Thanks, I am aware of both of these workarounds. Both feel like they would be unnecessary, in this case, if the compiler were a little smarter. On a bytecode level, it seems like the second suggestion would work almost identically to what I want to do, only it bloats the code more IMO
Kip
@Kip: I agree with you... Java certainly has its limitations. But I guess the required fundamental changes to the object initialization process isn't worth the effort since there are already viable workarounds.
Zach Scrivena
+2  A: 

"My guess is that, until a constructor has been called for every level of the heierarchy, the object is in an invalid state. It is unsafe for the JVM to run anything on it until it has been completely constructed."

Actually, it is possible to construct objects in Java without calling every constructor in the hierarchy, although not with the new keyword.

For example, when Java's serialization constructs an object during deserialization, it calls the constructor of the first non-serializable class in the hierarchy. So when java.util.HashMap is deserialized, first a java.util.HashMap instance is allocated and then the constructor of its first non-serializable superclass java.util.AbstractMap is called (which in turn calls java.lang.Object's constructor).

You can also use the Objenesis library to instantiate objects without calling the constructor.

Or if you are so inclined, you can generate the bytecode yourself (with ASM or similar). At the bytecode level, new Foo() compiles to two instructions:

NEW Foo
INVOKESPECIAL Foo.<init> ()V

If you want to avoid calling the constructor of Foo, you can change the second command, for example:

NEW Foo
INVOKESPECIAL java/lang/Object.<init> ()V

But even then, the constructor of Foo must contain a call to its superclass. Otherwise the JVM's class loader will throw an exception when loading the class, complaining that there is no call to super().

Esko Luontola
A: 

Well, the problem is java cannot detect what 'statements' you are going to put before the super call. For eg, you could refer to member variables which are not yet initialized. So I don't think java will ever support this. Now, there are many ways to workaround this problem - by using factory or template methods.

A: 

Allowing code to not call the super constructor first breaks encapsulation - the idea that you can write code and be able to prove that no matter what someone else does - extend it, invoke it, instansiate it - it will always be in a valid state.

IOW: it's not a JVM requirement as such, but a Comp Sci requirement. And an important one.

To solve your problem, incidentally, you make use of private static methods - they don't depend on any instance:

public BigFraction(BigDecimal d)
{
  this(appropriateInitializationNumeratorFor(d),
       appropriateInitializationDenominatorFor(d));
}

private static appropriateInitializationNumeratorFor(BigDecimal d)
{
  if(d.scale() < 0)
  {
    return d.unscaledValue().multiply(BigInteger.TEN.pow(-d.scale()));
  }
  else
  {
    return d.unscaledValue();
  }
}

If you don't like having separate methods (a lot of common logic you only want to execute once, for instance), have one method that returns a private little static inner class which is used to invoke a private constructor.

But in that case you're still executing code before calling the super constructor, you're just doing it in a way that java likes. I still don't see any good reason why they couldn't introduce a new syntax to allow it.
Kip
Yes, you are executing code. But not code that depends on the state of the object that is being constructed. For instance - try my example without making the methods static. Try passing *this* to the static method.And extra syntax? Call me old fashioned, but java has enough.
A: 

Look it this way.

Let's say that an object is composed of 10 parts.

1,2,3,4,5,6,7,8,9,10

Ok?

From 1 to 9 are in the super class, part #10 is your addition.

Simple cannot add the 10th part until the previous 9 are completed.

That's it.

If from 1-6 are from another super class that fine, the thing is one single object is created in a specific sequence, that's the way is was designed.

Of course real reason is far more complex than this, but I think this would pretty much answers the question.

As for the alternatives, I think there are plenty already posted here.

OscarRyz
bug if i call "super(Integer.parseInt(s));", the parseInt is getting executed before even part 1 in your analogy. i'm just saying let me execute more than one line of code before part 1. the compiler is perfectly capable of ensuring that i don't have access to parts 1-10 at this point.
Kip
That's correct, but that Integer.parseInt(s) belongs to another class sequence [ let's say Integer(0,1,2,3, whatever) ] While "this" object instance has not been created yet. Is like attempting to make surgery to an unborn baby. You can do many other task to his brother, or with the doctor....
OscarRyz
.. as they have already been instantiated :-) But not with the unborn baby, at least not with traditional methods ( you can make in utero surgery of course, that's the scenario where you say you can manipulate the bytecode ) but that's kind of an extreme situation.In normal conds. you have to wait,
OscarRyz
+2  A: 

I think several of the answers here are wrong because they assume encapsulation is somehow broken when calling super() after invoking some code. The fact is that the super can actually break encapsulation itself, because Java allows overriding methods in the constructor.

Consider these classes:

class A {
  protected int i;
  public void print() { System.out.println("Hello"); }
  public A() { i = 13; print(); }
}

class B extends A {
  private String msg;
  public void print() { System.out.println(msg); }
  public B(String msg) { super(); this.msg = msg; }
}

If you do

new B();

the message printed out is "null". That's because the constructor from A is accessing the uninitialized field from B. So frankly it seems that if someone wanted to do this:

class C extends A {
  public C() { 
    System.out.println(i); // i not yet initialized
    super();
  }
}

Then that's just as much their problem as if they make class B above. In both cases the programmer has to know how the variables are accessed during construction. And given that you can call super() or this() with all kinds of expressions in the parameter list, it seems like an artificial restriction that you can't compute any expressions before calling the other constructor. Not to mention that the restriction applies to both super() and this() when presumably you know how to not break your own encapsulation when calling this().

My verdict: This feature is a bug in the compiler, perhaps originally motivated by a good reason, but in its current form it is an artifical limitation with no purpose.

Mr. Shiny and New
thanks for the well-written response and clear example code
Kip
@Kip: No problem. I was hoping, when I found this question, to get some good answers but it seems there isn't one.
Mr. Shiny and New