views:

334

answers:

9

Hi,

Recently I'm encountering this type of code ever more frequently:

List<Something> list = new LinkedList<Something>();
// add a bunch of constants to the list
// keep in mind that the number of items added is known on list creation
return list.toArray(new Something[list.size()]);

Now in my honest opinion this is improper use of the Collections API and while I wouldn't make an issue out of this if I see it once or twice, I'm now encountering this everywhere.

I tried speaking to the author using something like: "hey why don't you use an ArrayList instead, since this is exactly what you need, because its toArray operation is constant speed". However in the authors opinion this is premature optimization and I should not care about it.

I completely agree that optimization should be done only if needed (and even than rarely :)), but in my opinion picking the correct collection to use is not premature optimization (nor is it time consuming or hard ... using LinkedList everywhere is simply 'bad code' IMHO).

So my question is. Is me worrying about these code blocks premature optimization in your opinion ? And if not how would you approach the situation, would you speak to the author again and what arguments would you use ?

+2  A: 

It's a matter of taste, in fact.

In the head of this developper, I think the idea is quite simple : throw at that collection interface whatever collection implementation I like, and change it only if a bottleneck is found due to this code.

However, in your head, the idea is more : as I can use a naturally efficient implementation of that collection, why use another ?

I personnally tend to prefer your point of view, as there is no reason to use inefficient code when making efficient is only a matter of chaning one line (generally visible on the screen). However, I do understand the contrary point of view is quite valid (although I must confess it would make me crazy if used as a systematical rule of thumb).

Beside, the reference to "premature optimization" in this context, once the code has been totally written is bullshit (or, to say it more politely, passive agressive style bullshit) : the optimization time is now, when the code is written, and not once the QA complains about sluggish applciation.

Riduidel
+23  A: 

In my opinion this is not premature optimization. Choosing the right collection takes no more than 5 seconds if the developer is familiar with the available collections. And writing inefficient code is just bad. If the case is more complex, then one could easily go for one list (the default being ArrayList) without thinking about all possible scenarios. But for simple ones like this - choose the right collection. These 5 seconds will save you a whole day later when it turns out that this method is called millions of times and is a bottleneck.

So my point is: "don't optimize prematurely" != "write crappy code until the problem hits you"

Bozho
In addition, this is the rule I usually follow: when in doubt, use `ArrayList`. I don't think anyone will dispute that this is optimal List implementation for 99% cases.
Nikita Rybak
@Nikita - indeed. `ArrayList` is the "default" collection.
Bozho
If the number of items to be added is known why create a `List` at all?
Qwerky
One of the few optimizations that isn't premature. But coding to the interface is even better.
DJClayworth
@Qwuerky - a very good point.
Bozho
I think I agree with Nikita here. @Qwerky, a minor reason would be that I this list.add(thing); seems a (very) tiny portion more readable than array[n] = thing;. It probably depends on what you're used to seeing but generally collection operations, having names, are (a little !) more readable than array operations.
Simeon
+5  A: 

There is a difference between choosing fitting algorithms and data structures and trying to squeeze a few CPU cycles out of a (sub-)program. The latter is generally frowned up except in rare circumstances, mostly because it trades maintainability for (usually) minor performance improvements. The former, on the contrary, will often give slightly cleaner and more unstandable code and also(!) improve performance considerably, at least for larger input. To choose a more obvious example, is doing lots of linear searches ("did someone already add this?") over an array as opposed to using a set ok ("no premature optimization!") or simply stupid? I hope we can all agree that there is a much more appropriate data structure. And as said before, this does not only improve performance but also aids maintainability: In this example, using a set tells the reader immediately that order doesn't matter but membership does. In your code, it's less of an issue, but using an ArrayList sure doesn't hurt maintainability. You have a fixed number of elements to add and you will eventually need to turn it into an array - an ArrayList simply is the obvious choice here, and the maintainer won't have to wonder why the heck this is a list.

delnan
This is EXACTLY my point :) IMO Array list is simply the obvious choice. I'm not sure however that this is big enough of an issue to criticize someone's code over and this is my general question.
Simeon
+1  A: 

In general the argument about whether ArrayList is better than LinkedList isn't answerable without knowing exactly what else is happening to the collection. Until you've finished writing the code that's an unknown.

The key thing here is to ensure that the code is written to the List interface, not to either ArrayList or LinkedList. Provided you do that, then if you decide to change the implementation later it's a one-line change. Worrying about one-line changes at this stage is definitely premature.

DJClayworth
A: 

Since we are talking about optimizations, there is another optimization you can do:

You mentioned in the question that you know the number of items that will be added to the List. I would suggest using the overloaded ArrayList constructor and provide the initialCapacity parameter. Note that LinkedList doesn't have such a constructor.

This will further negate the argument that you should use a LinkedList because you are doing a bunch of insertions. Java internally uses an array to represent an ArrayList. It starts out with a small size and as you continue to add items to the ArrayList and filling it up, Java has to copy the entire array into a new location. Java won't need to do this if you specify the numbers of items you'll be inserting with the constructor.

Note: Sorry, this probably belongs in the comments but I don't have the REP to comment yet.

kasgoku
+3  A: 

In general, choosing an appropriate kind of collection is important part of algorithmic design. The difference between O(N) versus O(NlogN) versus O(N^2) can determine whether a core algorithm (and therefore the entire application) is going to be viable.

However, collection choice is not always important.

In the example you chose, it seems that you creating a list of constants for some purpose. The chances are that this happens only once during the execution of the application, and is therefore NOT likely to impact significantly on application performance, assuming that it is long running. Furthermore, the fact that the size of the list is known before you start suggests that is is small. (If it was large, you'd be hard pressed knowing for sure.) So once again, a small list means that the actual saving from using the "right" list type (with an initial capacity) will be negligible.

If the optimization is not going to achieve any significant performance advantage, it is a waste of time from that perspective. And the doctrine of avoiding premature optimization is about avoiding wasting your time on optimizing things that have no significance.

Your example is certainly not something you should critique on performance grounds if you found it in your colleague's code. Not least because, he'd be right about "premature optimization" in this case.

However - if it was my code, I'd use an ArrayList (maybe with an initial capacity) anyway, on the grounds that it is the more elegant (and marginally more readable) way to do the job.

But you should not go around critiquing other people's code for being inelegant, because the question of what is elegant is highly subjective, and an endless source of pointless arguments.

  • critiquing performance issues (when significant) - yes,
  • ... correctness - yes,
  • ... readability/maintainability - yes,
  • ... conformance to agreed/mandated style - yes,
  • ... poor spelling and grammar - maybe,
  • ... elegance - no.
Stephen C
this was very helpful thanks !
Simeon
+9  A: 

Actually, if you want to return an array, why create an intermediary list at all? I find array initializers quite a bit more readable:

return new Something[] {
    Something.RED,
    Something.BLUE,
    Something.GREEN,
    Something.PURPLE,
    Something.YELLOW
};

Note that my argument is about readability, not performance. The number of constants is finite, and assuming there aren't thousands of them this method will execute within micro seconds, so unless it is called millions of times a second, it won't significantly tax a cpu - and even if that method were performance critical, one wouldn't fix this by replacing a LinkedList by an ArrayList, but by keeping a copy of the array around, or even redefine the method contract, so the method may return the same immutable instance each time rather than constructing a new one for each call. Bottom line: Replacing a LinkedList by an ArrayList for this usage pattern will only improve this method's runtime by a small constant factor, which hardly ever matters.

Alternatively, if you need a List, you can do:

return Arrays.asList(
    Something.RED,
    Something.BLUE,
    Something.GREEN,
    Something.PURPLE,
    Something.YELLOW
);
meriton
The static array initializer would be the correct answer in my opinion.
Jonathan
--------- Bingo ------
Robin
A: 

If you know you want an array at the end, and you know the number of elements in advance, why not just create an array to begin with? I don't see what is gained by creating some intermediate container object. It's not even so much a matter of optimization as it is just clarity of code.

o. nate
IMO doing list.add(item); is easier to read than array[5] = item;, a little easier maybe but still :) I think this is why the author uses these intermediate container objects. I myself also prefer to go through a collection usually, than directly creating arrays
Simeon
+1  A: 

You are both wrong.

  1. Selecting the correct algorithm ahead of time is an essential part of program design, not an 'optimization'.

  2. He has already chosen a LinkedList. He has to choose something. Making that decision isn't an 'optimization', it is an essential pre-condition to getting the program to compile.

  3. ArrayList.toArray(T[]) is not and cannot possibly be a constant-time O(1) operation.

EJP
I agree it is not O(1), this is my mistake. I assumed it is (without looking...), because conceptually an ArrayList is simply a self-resizable array and if your intention is to return an array the ARRAYlist is the obvious choice. IMO still the ArrayList.toArray(T[]) is faster than the LinkedList.toArray(T[]), but I really don't want to start a discussion on that and this is not the point of the question, nor would I criticize someone's code over this.
Simeon
If you think it is faster you must have a *reason.* What is it?
EJP
@EJP Not necessarily. Selecting the correct algorithm ahead of time is "design not optimization" only if that choice has an effect on the rest of the architecture. In this case, assuming the rest of the code only uses the List interface, not LinkedList, the algorithm can be changed very cheaply. Spending a long time discussing which list implementation to use at this stage would be a complete waste of time.
DJClayworth
I disagree. Selecting the correct algorithm is a critical part of program design. *In this case* because of the way Collections have been designed the decision can be deferred. That fact doesn't make the decision an 'optimization' let alone a 'premature optimization' (a much-abused phrase by the way, frequently as here used in ways that Don Knuth did not intend).
EJP