views:

226

answers:

7

I was wondering, is the size() method that you can call on a existing ArrayList<T> cached? Or is it preferable in performance critical code that I just store the size() in a local int?

I would expect that it is indeed cached, when you don't add/remove items between calls to size().

Am I right?

update
I am not talking about inlining or such things. I just want to know if the method size() itself caches the value internally, or that it dynamically computes every time when called.

A: 

The obvious implementation of ArrayList would be to store the size internally in a field. I would be very surprised if it ever had to be computed, even after resizing.

Marcelo Cantos
A: 

Why would it need to be? ArrayList implements a List interface that is backed by an array, after all.

I would assume it to just have a size member, that is incremented when you insert things and decremented when you remove, and then it just returns that.

I haven't looked at more than the API docs now, though.

unwind
@unwind, because function invocation overhead is non-neglibible. You are paying a lot just to be returned the value of a field. If it is being called over and over again in a loop, you might get some savings from eliding the call.
Michael Aaron Safyan
@Michael the JIT is smart enough to inline such simple methods, so then you don't have the function call overhead.
Jesper
@Michael: Sure, but that wasn't what the question was about. The question was whether ArrayList *caches* the size, so that size() can just return a cached value rather than "recompute it". My answer tries to explain that there is very likely nothing to compute, and others seem to back that up after checking the source. This has nothing to do with method call overhead.
unwind
@unwind, actually the question wasn't about whether *ArrayList* caches the size, but whether the size *is cached*, which can imply different places at which it was cached. It is obvious that an array list is going to simply return the size field of the array in O(1) time, so I interpreted it as asking whether the result of invoking size() is cached in a local variable in the function that calls size(), which I think is a perfectly valid (and, in fact, more logical) interpretation of the question.
Michael Aaron Safyan
@unwind, nevermind, the OP has clarified his question.
Michael Aaron Safyan
+6  A: 

I don't think I'd say it's "cached" as such - but it's just stored in a field, so it's fast enough to call frequently.

The Sun JDK implementation of size() is just:

public int size() {
    return size;
}
Jon Skeet
@Jon, but in performance critical code, that is paying a lot for function call overhead, when the value will be the same.
Michael Aaron Safyan
@Michael but the JIT is smart enough to inline such calls, so then you don't have the function call overhead.
Jesper
@Jesper - I appreciate that it may be theoretically possible to inline the call, but which VM actually does this?
spong
@spong the Sun JVM does that. It's more sophisticated than you think.
Jesper
@spong: since the Sun JVM did this since Release 1.2 (i.e. since around 1998), I think we can assume that all of them do, except perhaps embedded ones. Reference: http://java.sun.com/developer/onlineTraining/Programming/JDCBook/perf2.html
Michael Borgwardt
@Michael, thanks for the link. That clarifies things.
Michael Aaron Safyan
@Jesper - I think you're assuming that the call doesn't come in an already inlined method. That's likely true, but there are limitations that might cause the JVM to choose not to inline an otherwise eligible method. If you're having performance problems, storing the value in a local variable if you know it won't change would be an obvious first step.
tvanfosson
@Jesper - have a look at http://pastebin.com/b6FJw7AX - on my JVM, the (ahem) cached version runs in about 2/3 of time time using `Java HotSpot(TM) Client VM (build 1.5.0_22-147, mixed mode, sharing)` and about 1/2 the time using `Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01-101, mixed mode)`. Is there something I'm missing?
spong
@spong, my guess is that even with inlining, a field access implies a memory lookup, while saving to a local variable, explicitly, will cause a CPU register to be used.
Michael Aaron Safyan
Great, that's exactly what I needed to know.
Peterdk
@Michael Borgwardt the linked sun document states, amongst other things "...the addCount method doesn't look very attractive to the inline detector in the VM because the addCount method returns a value". So not applicable to the `size()` method.
spong
+2  A: 

I don't know the answer for sure, but my guess would be: no. There is no way for a Java compiler, short of special casing ArrayList, to know that the functions you invoke will be non-mutating and that, as a result, the invocation of size() should return the same value. Therefore, I find it highly unlikely that a Java compiler will factor out repeated calls to size() and store them in a temporary value. If you need that level of optimization then you should store the value in a local variable yourself. Otherwise, yes, you will pay for the function invocation overhead associated with calling the size() method. Note, though, that the size() method is O(1) for an ArrayList (though the function call overhead is pretty hefty). Personally, I would factor out any calls to size() from loops and manually store them in a local where applicable.

Edit
Even though such an optimization cannot be performed by a Java compiler, it has been aptly pointed out that the JIT can inline the implementation of ArrayList.size() such that it only costs the same as a field access, without any additional method call overhead, so in effect the costs are negligible, although you might still save slightly by manually saving in a temporary (which could potentially eliminate a memory lookup and instead serve the variable out of a CPU register).

Michael Aaron Safyan
Why the downvote?
Michael Aaron Safyan
Presumably because your answer is factually inaccurate - `size()` does indeed return without performing any additional calculations, and the code does know which method implementations modify size and which do not. Your discussion of compilers is a red herring, this is implemented at the source code level and is very simple.Besides, saying that you don't know the answer for sure (when it would take two minutes to check) and answering anyway isn't very helpful.
Andrzej Doyle
@Michael -- I assume that it's because everyone else is interpreting the question as "is the value stored internally or computed each time" and you are interpreting it as "will you have to call the method each time or is the value stored in a register after the first call."
tvanfosson
@Andrzej, I didn't imply that size() does additional computation. I even said that it is O(1). You clearly do not understand the question, and this would have to be implemented by the Java compiler (of which there are several implementation, making it impossible to give a single answer without looking at all implementations).
Michael Aaron Safyan
@tvanfosson, yeah, I guess so... having come from a C++ background where "const" is used specifically as a way to hint to the compiler that it is safe to elide function calls and simply store the result in a local variable, I clearly have a different reading of the question than some of these other folks.
Michael Aaron Safyan
@Andrzej, also it would take some non-trivial analysis for a compiler to determine that it was safe to elide the call. And given that different JARs can be used, implementing the same class, it would really be impossible for the compiler to know that the JAR that will be loaded at runtime will have the same behavior with respect to whether it mutates the result of the size() call.
Michael Aaron Safyan
Function call overhead is generally considered negligible in Java code, since the JIT compiler can easily inline trivial functions like this. It doesn't have to keep the value separately, it merely has to make the client code access the underlying field directly.
Michael Borgwardt
This kind of thing is *of course* not done by the java compiler, but by the JIT compiler at runtime.
Michael Borgwardt
Given our alternative understandings of the question, I see where you're coming from. You were talking about whether the overhead of a method dispatch could be avoided, whereas I (and presumably the downvoter) thought that Peterdk was asking whether the cost of (internally) iterating over all the elements to count could be avoided. Hence why compilers and inlining seemed offtopic. Though actually most optimising compilation happens at runtime (in Hotspot) so dynamic recompilation to inline methods (and reverting out later if a new subclass is loaded) is in fact possible.
Andrzej Doyle
+3  A: 

Yes.

A quick look at the Java source would tell you the answer.

Tim Drisdelle
@Tim, mind pointing out where in the code that is?
Michael Aaron Safyan
@Michael - In the `src.zip` file in your JDK installation directory.
Jesper
Sorry! I made assumptions - I generally keep access to JDK source from within my IDE. Thanks for the hand, Jesper.
Tim Drisdelle
@Jesper, where is it being stored in a local variable?
Michael Aaron Safyan
@Michael did you look at the source code of `java.util.ArrayList`? There's a `private int size;` on line 98 (in Sun JDK 1.6.0_20).
Jesper
@Jesper, we are interpreting the question differently. I was asking, where does it ensure that it gets stored and reused in a local variable *in the caller*, not in the function, itself. Also, that isn't a local variable; that is a member variable. But, nevermind, Michael has clarified the optimization.
Michael Aaron Safyan
+1  A: 

This is the implementation in OpenJDK version:

/**
 * Returns the number of elements in this list.
 *
 * @return the number of elements in this list
 */
public int size() {
    return size;
}

So it's as good as a method call is going to get. It's not very likely that HotSpot caches the value returned by this method, so if you're really THAT concerned, you can cache it yourself. Unless your profiling has shown that this is a bottleneck, though (not very likely), you should just concern yourself with readability rather than whether a simple method call that returns the value of a field is cached or not.

polygenelubricants
A: 

If caching the result of the size() method would noticeably improve performance (which it sometimes does - I regularly see ArrayList.size() as the top compiled method in my -Xprof output) then consider converting the whole list to an array for an even greater speedup.

Here's one trick that can work if you iterate over the list regularly but only rarely update it:

class FooProcessor {

    private Foo[] fooArray = null;
    private List<Foo> fooList = new ArrayList<Foo>();

    public void addFoo(Foo foo) {
       fooList.add(foo);
       fooArray = null;
    }

    public void processAllFoos() {
        Foo[] foos = getFooArray();
        for (int i = 0; i < foos.length; ++ i) {
            process(foos[i]);
        }
    }

    private void getFooArray() {
        if (fooArray == null) {
            Foo[] tmpArray = new Foo[fooList.size()];
            fooArray = fooList.toArray(tmpArray);
        }
        return fooArray;
    }

}
finnw