views:

237

answers:

9

I'm having a contest with another student to make the fastest version of our homework assignment, and I'm not using an ArrayList for performance reasons (resizing the array myself cut the benchmark time from 56 seconds to 4), but I'm wondering how much I should resize the array every time I need to. Specifically the relevant parts of my code are this:

private Node[] list;
private int size; // The number of items in the list
private static final int N; // How much to resize the list by every time

public MyClass(){
  list = new Node[N];
}

public void add(Node newNode){
  if(size == list.length){
    list = Arrays.copyOf(list, size + N);
  }
  list[size] = newNode;
  size++;
}

TL;DR: What should I make N?

+4  A: 

It's recommended to double the size of the array when resizing. Doubling the size leads to amortized linear-time cost.

The naive idea is that there are two costs associated with the resize value:

  • Copying performance costs - costs of copying the elements from previous array to new one, and
  • Memory overhead costs - cost of the allotted memory that is not used.

If you were to re-size the array by adding one element at a time, the memory overhead is zero, but the copying cost becomes quadratic. If you were to allocate too much slots, the copying cost will be linear, but the memory overhead is too much.

Doubling leads to a linear amortized cost (i.e. over a long time, the cost of copying is linear with respect to the size of the array), and you are guaranteed not to waste more than half of the array.

UPDATE: By the way, apparently Java's ArrayList expands by (3/2). This makes it a bit more memory conservative, but cost a bit more in terms of copying. Benchmarking for your use wouldn't hurt.

MINOR Correction: Doubling would make the cost resizing linear amortized, but would ensure that you have a amortized constant time insertion. Check CMU's lecture on Amortized Analysis.

notnoop
I used the idea of doubling the size in my code and benchmarked it, and new ArrayList(1000) was about 100x slower than my code with initial size 1000;
Brendan Long
Doubling prevents use of previously-allocated space, should that space still be available. The factor *phi* defined the upper bound on the growth factor that tolerates such reuse. With a first-fit allocator not suffering contention, a growth rate of *phi* will only need to allocate more space on *every other* reallocation.
seh
+2  A: 

If you know roughly how many items there are going to be, then pre-assign the array or the ArrayList to that size, and you'll never have to expand. Unbeatable performance!

Failing that, a reasonable way to achieve good amortized cost is to keep icreasing by some percentage. 100% or 50% are common.

Carl Smotricz
+2  A: 

You should resize your lists as a multiple of the previous size, rather than adding a constant amount each time.

for example:

newSize = oldSize * 2;

not

newSize = oldSize + N;
sdtom
+2  A: 

Double the size each time you need to resize unless you know that more or less would be best.

If memory isn't an issue, just start off with a big array to begin with.

Ben S
The problem is that memory isn't an issue, but I'm reading an arbitrarily large file.
Brendan Long
Also I'm turning this in, so having list = new Node[Integer.MAX_VALUE] might make the teacher unhappy.
Brendan Long
That would also likely be more memory than the system has. I would start with something more modest, like 1024 or 2048.
Ben S
A: 

For maximum performance, you're going to want to resize as rarely as possible. Set the initial size to be as large as you'll typically need, rather than starting with N elements. The value you choose for N will matter less in that case.


If you are going to create a large number of these list objects, of varying sizes, then you'll want to use a pool-based allocator, and not free the memory until you exit.

And to eliminate the copy operation altogether, you can use a list of arrays

Mark Bessey
+2  A: 

Your code seems to do pretty much what ArrayList does - if you know you will be using a large list, you can pass it an initial size when you create the list and avoid resizing at all. This ofcourse assumes that you're going for raw speed and memory consumption is not an issue.

Christian P.
I tried my code with N = 1000; vs new ArrayList(1000); and mine was ~100x faster. Good idea though, I hadn't thought to set the initial size.
Brendan Long
Seems odd, but I suspect ArrayList might have some sanity checks that slow it down.
Christian P.
@Brendan Your benchmark result seems extremely strange. Looking at the source for ArrayList, at least on my openjdk 1.6.0, it does exactly what you're doing; give or take the few arithmetic operations to calculate the new capacity (which is negligible compared to the cost of copying the array).
Kieron
Here's my test script: http://pastebin.com/m536bb968 And there's my array class: http://pastebin.com/m75f34b75 On my computer the ArrayList took about 2.5 seconds and my array took .2 seconds. I don't really know why.. I'm using the Sun JDK.
Brendan Long
Also I tried changing the second one to `list = Arrays.copyOf(list, size * 3/2);` and it didn't make a difference.
Brendan Long
Oh it's the Object[] part. I you change my code to be Object[] list the speeds are the same. :(
Brendan Long
+4  A: 

3/2 is likely chosen as "something that divides cleanly but is less than phi". There was an epic thread on comp.lang.c++.moderated back in November 2003 exploring how phi establishes an upper bound on reusing previously-allocated storage during reallocation for a first-fit allocator.

See post #7 from Andrew Koenig for the first mention of phi's application to this problem.

seh
+1 Very interesting, thanks!
Carl Smotricz
+1  A: 

From the comments of one of the answers:

The problem is that memory isn't an issue, but I'm reading an arbitrarily large file.

Try this:

new ArrayList<Node>((int)file.length());

You could do it with your array as well. Then there should be no resizing in either case since the array will be the size of the file (assuming that the file is not longer then an int...).

TofuBeer
A: 

Here's an analogy for you, long long ago when I used to work on a mainframe we used a filing system called VSAM, which would require you to specify the initial file size and the amount of freespace required.

Whenever the amount of freespace dropped below threshold required, then the amount of freespace required would be allocated in the background while the program continued to process.

It would be interesting to see if this could be done in java using a separate thread to allocate the additional space and 'attach' it to the end of the array while the main thread continues processing.

crowne
I seriously doubt Java would give you that much control. The best I can do make a new array and hope Java's array copy copies a section of memory instead of just being a for loop.. :)
Brendan Long