views:

3262

answers:

9

In Hidden Features of Java the top answer mentions Double Brace Initialization, with a very enticing syntax:

Set<String> flavors = new HashSet<String>() {{
    add("vanilla");
    add("strawberry");
    add("chocolate");
    add("butter pecan");
}};

This idiom creates an anonymous inner class with just an instance initializer in it, which "can use any [...] methods in the containing scope".

Main question: Is this as inefficient as it sounds? Should its use be limited to one-off initializations? (And of course showing off!)

Second question: The new HashSet must be the "this" used in the instance initializer ... can anyone shed light on the mechanism?

Third question: Is this idiom too obscure to use in production code?

Summary: Very, very nice answers, thanks everyone. On question (3), people felt the syntax should be clear (though I'd recommend an occasional comment, especially if your code will pass on to developers who may not be familiar with it).

On question (1), The generated code should run quickly. The extra .class files do cause jar file clutter, and slow program startup slightly (thanks to coobird for measuring that). Thilo pointed out that garbage collection can be affected, and the memory cost for the extra loaded classes may be a factor in some cases.

Question (2) turned out to be most interesting to me. If I understand the answers, what's happening in DBI is that the anonymous inner class extends the class of the object being constructed by the new operator, and hence has a "this" value referencing the instance being constructed. Very neat.

Overall, DBI strikes me as something of an intellectual curiousity. Coobird and others point out you can achieve the same effect with Arrays.asList, varargs methods, Google Collections, and the proposed Java 7 Collection literals. Newer JVM languages like Scala, JRuby, and Groovy also offer concise notations for list construction, and interoperate well with Java. Given that DBI clutters up the classpath, slows down class loading a bit, and makes the code a tad more obscure, I'd probably shy away from it. However, I plan to spring this on a friend who's just gotten his SCJP and loves good natured jousts about Java semantics! ;-) Thanks everyone!

+13  A: 

Taking the following test class:

public class Test {
  public void test() {
    Set<String> flavors = new HashSet<String>() {{
        add("vanilla");
        add("strawberry");
        add("chocolate");
        add("butter pecan");
    }};
  }
}

and then decompiling the class file, I see:

public class Test {
  public void test() {
    java.util.Set flavors = new HashSet() {

      final Test this$0;

      {
        this$0 = Test.this;
        super();
        add("vanilla");
        add("strawberry");
        add("chocolate");
        add("butter pecan");
      }
    };
  }
}

This doesn't look terribly inefficient to me. If I were worried about performance for something like this, I'd profile it. And your question #2 is answered by the above code: You're inside an implicit constructor (and instance initializer) for your inner class, so "this" refers to this inner class.

Yes, this syntax is obscure, but a comment can clarify obscure syntax usage. To clarify the syntax, most people are familiar with a static initializer block (JLS 8.7 Static Initializers):

public class Sample1 {
    private static final String someVar;
    static {
        String temp = null;
        ..... // block of code setting temp
        someVar = temp;
    }
}

You can also use a similar syntax (without the word "static") for constructor usage (JLS 8.6 Instance Initializers), although I have never seen this used in production code. This is much less commonly known.

public class Sample2 {
    private final String someVar;

    // This is an instance initializer
    {
        String temp = null;
        ..... // block of code setting temp
        someVar = temp;
    }
}

If you don't have a default constructor, then the block of code between { and } is turned into a constructor by the compiler. With this in mind, unravel the double brace code:

public void test() {
  Set<String> flavors = new HashSet<String>() {
      {
        add("vanilla");
        add("strawberry");
        add("chocolate");
        add("butter pecan");
      }
  };
}

The block of code between the inner-most braces is turned into a constructor by the compiler. The outer-most braces delimit the anonymous inner class. To take this the final step of making everything non-anonymous:

public void test() {
  Set<String> flavors = new MyHashSet();
}

class MyHashSet extends HashSet<String>() {
    public MyHashSet() {
        add("vanilla");
        add("strawberry");
        add("chocolate");
        add("butter pecan");
    }
}

For initialization purposes, I'd say there is no overhead whatsoever (or so small that it can be neglected). However, every use of flavors will go not against HashSet but instead against MyHashSet. There is probably a small (and quite possibly negligible) overhead to this. But again, before I worried about it, I would profile it.

Again, to your question #2, the above code is the logical and explicit equivalent of double brace initialization, and it makes it obvious where "this" refers: To the inner class that extends HashSet.

If you have questions about the details of instance initializers, check out the details in the JLS documentation.

Eddie
Eddie, very nice explanation. If the JVM byte codes are as clean as the decompilation, the execution speed will be fast enough, though I'd be somewhat concerned about the extra .class file clutter. I'm still curious as to why the constructor of the instance initializer sees "this" as the new HashSet<String> instance and not the Test instance. Is this just explicitly specified behavior in the latest Java Language Specification to support the idiom?
Jim Ferrans
I updated my answer. I left out the boilerplate of the Test class, which caused the confusion. I put it into my answer to make things more obvious. I also mention the JLS section for the instance initializer blocks used in this idiom.
Eddie
@Jim The interpretation of "this" is not a special case; it simply refers to the instance of the innermost enclosing class, which is the anonymous subclass of HashSet<String>.
Nathan Kitchen
@Nathan, ah, thanks, that makes sense.
Jim Ferrans
A: 

1) This will call add() for each member. If you can find a more efficient way to put items into a hash set, then use that. Note that the inner class will likely generate garbage, if you're sensitive about that.

2) It seems to me as if the context is the object returned by "new," which is the HashSet.

3) If you need to ask... More likely: will the people who come after you know this or not? Is it easy to understand and explain? If you can answer "yes" to both, feel free to use it.

+3  A: 

Efficiency aside, I rarely find myself wishing for declarative collection creation outside of unit tests. I do believe that the double brace syntax is very readable.

Another way to achieve the declarative construction of lists specifically is to use Arrays.asList(T ...) like so:

List<String> aList = Arrays.asList("vanilla", "strawberry", "chocolate");

The limitation of this approach is of course that you cannot control the specific type of list to be generated.

Paul Morie
Arrays.asList() is what I would normally use, but you're right, this situation arises mainly in unit tests; real code would construct the lists from DB queries, XML, and so on.
Jim Ferrans
Beware of asList, though: the returned list does not support adding or removing elements. Whenever I use asList, I pass the resulting list into a constructor like `new ArrayList<String>(Arrays.asList("vanilla", "strawberry", "chocolate"))` to get around this problem.
Michael Myers
Good point, thanks for mentioning it.
Paul Morie
+54  A: 

Update: Added an experiment to evaluate the performance of double brace initialization.

Here's the problem when I get too carried away with anonymous inner classes:

2009/05/27  16:35             1,602 DemoApp2$1.class
2009/05/27  16:35             1,976 DemoApp2$10.class
2009/05/27  16:35             1,919 DemoApp2$11.class
2009/05/27  16:35             2,404 DemoApp2$12.class
2009/05/27  16:35             1,197 DemoApp2$13.class

/* snip */

2009/05/27  16:35             1,953 DemoApp2$30.class
2009/05/27  16:35             1,910 DemoApp2$31.class
2009/05/27  16:35             2,007 DemoApp2$32.class
2009/05/27  16:35               926 DemoApp2$33$1$1.class
2009/05/27  16:35             4,104 DemoApp2$33$1.class
2009/05/27  16:35             2,849 DemoApp2$33.class
2009/05/27  16:35               926 DemoApp2$34$1$1.class
2009/05/27  16:35             4,234 DemoApp2$34$1.class
2009/05/27  16:35             2,849 DemoApp2$34.class

/* snip */

2009/05/27  16:35               614 DemoApp2$40.class
2009/05/27  16:35             2,344 DemoApp2$5.class
2009/05/27  16:35             1,551 DemoApp2$6.class
2009/05/27  16:35             1,604 DemoApp2$7.class
2009/05/27  16:35             1,809 DemoApp2$8.class
2009/05/27  16:35             2,022 DemoApp2$9.class

These are all classes which were generated when I was making a simple application, and used copious amounts of anonymous inner classes -- each class will be compiled into a separate class file.

The "double brace initialization", as already mentioned, is an anonymous inner class with a instance initialization block, which means that a new class is created for each "initialization", all for the purpose of usually making a single object.

Considering that the Java Virtual Machine will need to read all those classes when using them, that can lead to some time in the bytecode verfication process and such. Not to mention the increase in the needed disk space in order to store all those class files.

It seems as if there is a bit of overhead when utilizing double-brace initialization, so it's probably not such a good idea to go too overboard with it. But as Eddie has noted in the comments, it's not possible to be absolutely sure of the impact.


Just for reference, double brace initialization is the following:

List<String> list = new ArrayList<String>() {{
    add("Hello");
    add("World!");
}};

It looks like a "hidden" feature of Java, but it is just a rewrite of:

List<String> list = new ArrayList<String>() {

    // Instance initialization block
    {
        add("Hello");
        add("World!");
    }
};

So it's basically a instance initialization block that is part of an anonymous inner class.


As an added note, if Joshua Bloch's Collection Literals proposal for Project Coin goes through, we may be able to see these kinds of syntax in Java 7:

List<Integer> intList = [1, 2, 3, 4];

Set<String> strSet = {"Apple", "Banana", "Cactus"};

Map<String, Integer> truthMap = { "answer" : 42 };

If this change makes it to Java 7, it may eliminate a good portion of the use cases for double brace initialization.


Experiment

Here's the simple experiment I've tested -- make 1000 ArrayLists with the elements "Hello" and "World!" added to them via the add method, using the two methods:

Method 1: Double Brace Initialization

List<String> l = new ArrayList<String>() {{
  add("Hello");
  add("World!");
}};

Method 2: Instantiate an ArrayList and add

List<String> l = new ArrayList<String>();
l.add("Hello");
l.add("World!");

I created a simple program to write out a Java source file to perform 1000 initializations using the two methods:

Test 1:

class Test1 {
  public static void main(String[] s) {
    long st = System.currentTimeMillis();

    List<String> l0 = new ArrayList<String>() {{
      add("Hello");
      add("World!");
    }};

    List<String> l1 = new ArrayList<String>() {{
      add("Hello");
      add("World!");
    }};

    /* snip */

    List<String> l999 = new ArrayList<String>() {{
      add("Hello");
      add("World!");
    }};

    System.out.println(System.currentTimeMillis() - st);
  }
}

Test 2:

class Test2 {
  public static void main(String[] s) {
    long st = System.currentTimeMillis();

    List<String> l0 = new ArrayList<String>();
    l0.add("Hello");
    l0.add("World!");

    List<String> l1 = new ArrayList<String>();
    l1.add("Hello");
    l1.add("World!");

    /* snip */

    List<String> l999 = new ArrayList<String>();
    l999.add("Hello");
    l999.add("World!");

    System.out.println(System.currentTimeMillis() - st);
  }
}

Please note, that the elapsed time to initialize the 1000 ArrayLists and the 1000 anonymous inner classes extending ArrayList is checked using the System.currentTimeMillis, so the timer does not have a very high resolution. On my Windows system, the resolution is around 15-16 milliseconds.

The results for 10 runs of the two tests were the following:

Test1 Times (ms)           Test2 Times (ms)
----------------           ----------------
           187                          0
           203                          0
           203                          0
           188                          0
           188                          0
           187                          0
           203                          0
           188                          0
           188                          0
           203                          0

As can be seen, the double brace initialization has a noticeable execution time of around 190 ms.

Meanwhile, the ArrayList initialization execution time came out to be 0 ms. Of course, the timer resolution should be taken into account, but it is likely to be under 15 ms seconds.

So, there seems to be a noticeable difference in the execution time of the two methods. It does appear that there is indeed some overhead in the two initialization methods.

And yes, there were 1000 .class files generated by compiling the Test1 double brace initialization test program.

Finally, thank you for reading this extremely long answer!

coobird
"Probably" being the operative word. Unless measured, no statements about performance are meaningful.
Daniel Straight
Coobird, thanks this is helpful.
Jim Ferrans
You say "there is actually quite a bit of overhead when performing double-brace initialization," but it really doesn't look like this is the case. I won't believe such a statement without profiling, in any case.
Eddie
That's true, it's hard to be truly sure if there is a large overhead with double brace initialization without profiling. I'll change the wording.
coobird
Isn't this just because you're loading a class each time?
Neil Coffey
@Niel Coffey: Yes, because each double brace initialization is a separate anonymous inner class, it will create a new class for each initialization.
coobird
You've done such a great job I hardly want to say this, but the Test1 times could be dominated by class loads. It would be interesting to see someone run a single instance of each test in a for loop say 1,000 times, then run it again in a second for loop of 1,000 or 10,000 times and print out the time difference (System.nanoTime()). The first for loop should get past all the warm up effects (JIT, classload, eg). Both tests model different use-cases though. I'll try to run this tomorrow at work.
Jim Ferrans
@Jim Ferrans: I'm fairly certain that the Test1 times are from class loads. But, the consequence of using double brace initialization is having to cope from class loads. I believe most use cases for double brace init. is for one-time initialization, the test is closer in conditions to a typical use case of this type of initialization.I would believe that multiple iterations of each test would make the execution time gap smaller.
coobird
@coobird: Yes, you're right, it would be a very strange use-case that put one of these puppies in an inner loop.
Jim Ferrans
Even if you put it into a loop, there would still be only one inner class (just many instances thereof).
Thilo
@coobird: I hope you didn't write that test code by hand! @Jim Ferrans: even if this was in an inner loop, you only have to pay for class loading one time. It's a one-time cost per inner class. @coobird's test had 1000 separate anonymous inner classes!!!
Eddie
To do this test and properly "warm up" the code being used, I think you have to use a class loader so you can unload the class objects after each test and thus and force reloading the class objects each time. It's non-trivial.
Eddie
@Eddie: Don't worry, I wrote a short program to generate the source code for the tests. :)
coobird
What this proves is that a) double-brace initialization is slower, and b) even if you do it 1000 times, you probably won't notice the difference. And it's not like this could be the bottleneck in an inner loop, either. It imposes a tiny one-time penalty AT THE VERY WORST.
Michael Myers
That and a bit of permgen space, actually. But they're such tiny classes that I doubt that's much of a problem.
Michael Myers
Thank you really a great answer
Rahul Garg
If using DBI makes the code more readable or expressive, then use it. The fact that it increases a bit the work the JVM has to perform is not a valid argument, in itself, against it. If it were, then we should also be worried about extra helper methods/classes, preferring instead huge classes with fewer methods...
Rogerio
+7  A: 

To create sets you can use a varargs factory method instead of double-brace initialisation:

public static Set<T> setOf(T ... elements) {
    return new HashSet<T>(Arrays.asList(elements));
}

The Google Collections library has lots of convenience methods like this, as well as loads of other useful functionality.

As for the idiom's obscurity, I encounter it and use it in production code all the time. I'd be more concerned about programmers who get confused by the idiom being allowed to write production code.

Nat
Hah! ;-) I'm actually a Rip van Winkle returning to Java from the 1.2 days (I wrote the VoiceXML voice web browser at http://evolution.voxeo.com/ in Java). It's been fun learning generics, parameterized types, Collections, java.util.concurrent, the new for loop syntax, etc. It's a better language now. To your point, even though the mechanism behind DBI may seem obscure at first, the meaning of the code should be pretty clear.
Jim Ferrans
A: 

I second Nat's answer, except I would use a loop instead of creating and immediately tossing the implicit List from asList(elements):

static public Set<T> setOf(T ... elements) {
    Set set=new HashSet<T>(elements.size());
    for(T elm: elements) { set.add(elm); }
    return set;
    }
Software Monkey
Why? The new object will be created in the eden space, and so only require two or three pointer additions to instantiate. The JVM may notice that it never escapes beyond the method scope and so allocate it on the stack.
Nat
Yeah, it's likely to end up more efficient than that code (although you can improve it by telling the `HashSet` a suggested capacity - remember load factor).
Tom Hawtin - tackline
Well, the HashSet constructor has to do the iteration anyway, so it's not going to be *less* efficient. Library code created for reuse should always strive to be the very *best* possible.
Software Monkey
+3  A: 

There's generally nothing particularly inefficient about it. It doesn't generally matter to the JVM that you've made a subclass and added a constructor to it-- that's a normal, everyday thing to do in an object-oriented language. I can think of quite contrived cases where you could cause an inefficiency by doing this (e.g. you have a repeatedly-called method that ends up taking a mixture of different classes because of this subclass, whereas ordinary the class passed in would be totally predictable-- in the latter case, the JIT compiler could make optimisations that are not feasible in the first). But really, I think the cases where it'll matter are very contrived.

I'd see the issue more from the point of view of whether you want to "clutter things up" with lots of anonymous classes. As a rough guide, consider using the idiom no more than you'd use, say, anonymous classes for event handlers.

In (2), you're inside the constructor of an object, so "this" refers to the object you're constructing. That's no different to any other constructor.

As for (3), that really depends on who's maintaining your code, I guess. If you don't know this in advance, then a benchmark that I would suggest using is "do you see this in the source code to the JDK?" (in this case, I don't recall seeing many anonymous initialisers, and certainly not in cases where that's the only content of the anonymous class). In most moderately sized projects, I'd argue you're really going to need your programmers to understand the JDK source at some point or other, so any syntax or idiom used there is "fair game". Beyond that, I'd say, train people on that syntax if you have control of who's maintaining the code, else comment or avoid.

Neil Coffey
+9  A: 

One property of this approach that has not been pointed out so far is that because you create inner classes, the whole containing class is captured in its scope. This means that as long as your Set is alive, it will retain a pointer to the containing instance (this$0) and keep that from being garbage-collected, which could be an issue.

This, and the fact that a new class gets created in the first place even though a regular HashSet would work just fine (or even better), makes me not want to use this construct (even though I really long for the syntactic sugar).

Second question: The new HashSet must be the "this" used in the instance initializer ... can anyone shed light on the mechanism? I'd have naively expected "this" to refer to the object initializing "flavors".

This is just how inner classes work. They get their own this, but they also have pointers to the parent instance, so that you can call methods on the containing object as well. In case of a naming conflict, the inner class (in your case HashSet) takes precedence, but you can prefix "this" with a classname to get the outer method as well.

public class Test {

    public void add(Object o) {
    }

    public Set<String> makeSet() {
        return new HashSet<String>() {
            {
              add("hello"); // HashSet
              Test.this.add("hello"); // outer instance 
            }
        };
    }
}

To be clear on the anonymous subclass being created, you could define methods in there as well. For example override HashSet.add()

    public Set<String> makeSet() {
        return new HashSet<String>() {
            {
              add("hello"); // not HashSet anymore ...
            }

            @Override
            boolean add(String s){

            }

        };
    }
Thilo
Very good point on the hidden reference to the containing class. In the original example, the instance initializer is calling the add() method of the new HashSet<String>, not Test.this.add(). That suggests to me that something else is happening. Is there an anonymous inner class for the HashSet<String>, as Nathan Kitchen suggests?
Jim Ferrans
Very insightful observation, Thilo. +1
Oak
A: 

Mario Gleichman describes how to use Java 1.5 generic functions to simulate Scala List literals, though sadly you wind up with immutable Lists.

He defines this class:

package literal;

public class collection {
    public static <T> List<T> List(T...elems){
        return Arrays.asList( elems );
    }
}

and uses it thusly:

import static literal.collection.List;
import static system.io.*;

public class CollectionDemo {
    public void demoList(){
        List<String> slist = List( "a", "b", "c" );
        List<Integer> iList = List( 1, 2, 3 );
        for( String elem : List( "a", "java", "list" ) )
            System.out.println( elem );
    }
}

Google Collections supports a similar idea for list construction. In this interview, Jared Levy says:

[...] the most heavily-used features, which appear in almost every Java class I write, are static methods that reduce the number of repetitive keystrokes in your Java code. It's so convenient being able to enter commands like the following:

Map<OneClassWithALongName, AnotherClassWithALongName> = Maps.newHashMap();

List<String> animals = Lists.immutableList("cat", "dog", "horse");

Jim Ferrans