views:

464

answers:

8

Why does the classic implementation of Vector (ArrayList for Java people) double its internal array size on each expansion instead of tripling or quadrupling it?

+4  A: 

Exponentially doubling the size of the array(or string) is a good compromise between having enough cells in the array and wasting too much memory.

Say we start off with 10 elements:

1 - 10
2 - 20
3 - 40
4 - 80
5 - 160

When we triple the size, we grow too fast

1 - 10
2 - 30
3 - 90
4 - 270
5 - 810

In practice you would grow maybe 10 or 12 times. If you triple you would maybe do it 7 or 8 times - the runtime hit for reallocating is this few times is sufficiently small to worry about but you are more likely to completely overshoot the required size.

Igor Zevaka
Ok, but then you could argue to say that the vector could just expand to one more element, or expand by half as much elements.Is there any particular reason for doubling it?
Absolute0
If your current size is 1,000,000 cells then doubling and copying seems very expensive.
Absolute0
When you double, you are guaranteed to waste at **most** the amount of memory you want to use. The point of growing exponentially is to not have to grow at all as you are around about the target size.
Igor Zevaka
Guaranteed?Rarely anything is guaranteed in coding. :)Your next input size could be a n! of what you currently have...
Absolute0
Actually, things can be guaranteed by requirements. If your requirement is to process input from files, and each file contains 50 records, then you could add 50 new elements just before you start reading a file, for example...other than that, it's all about tradeoffs between memory and performance. Do you need to be sure that you always have sufficient room without needing to copy? Or do you need a smaller memory footprint and can afford the copying?
Thomas Owens
I'll rephrase, at any given time for a vector of size N the most physical memory it uses is 2*N.
Igor Zevaka
@Ramin: and the reason not to add just one more element, is that then adding N elements would require time proportional to N^2, since you'd resize every time. With exponential growth, it requires time proportion to N.
Steve Jessop
_ahem_ ... 90*3 != 180
chrispy
Too true. I've been outsourcing my maths to computers for too long :)
Igor Zevaka
+2  A: 

If you are asking about the Java-specific implementation of Vector and ArrayList, then it's not necessarily doubled on each expansion.

From the Javadoc for Vector:

Each vector tries to optimize storage management by maintaining a capacity and a capacityIncrement. The capacity is always at least as large as the vector size; it is usually larger because as components are added to the vector, the vector's storage increases in chunks the size of capacityIncrement. An application can increase the capacity of a vector before inserting a large number of components; this reduces the amount of incremental reallocation.

One of the constructors for Vector allows you to specify the initial size and capacity increment for the Vector. The Vector class also provides the ensureCapacity(int minCapacity) and setSize(int newSize), for manual adjustments of the minimum size of the Vector and to resize the Vector on your own.

The ArrayList class is very similar:

Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.

An application can increase the capacity of an ArrayList instance before adding a large number of elements using the ensureCapacity operation. This may reduce the amount of incremental reallocation.

If you are asking about the general implementation of a vector, than the choice of increase in size and by how much is a trade-off. Generally, vectors are backed by arrays. Arrays are of a fixed size. To resize a vector because it's full means that you have to copy all elements of an array into a new, larger array. If you make your new array too large, then you have allocated memory that you will never use. If it's too small, it might take too long to copy the elements from the old array into the new, larger array - an operation that you don't want to perform very often.

Thomas Owens
A: 

There is no performance reason for doubling vs tripling or quadrupling as the all have the same big O performance profiles. However in absolute terms doubling will tend to be more space efficient in the normal scenario.

Joshua
+4  A: 

If you were to allocate an unusual-sized block of memory, then when that block gets deallocated (either because you're resizing it or it gets GC'd) there would be an unusual-sized hole in memory which could cause headaches for the memory manager. So it's usually preferred to allocate memory in powers of two. In some cases the underlying memory manager will only give you blocks of certain sizes, and if you request a weird size it will round up to the next larger size. So rather than asking for 470 units, getting back 512 anyway, and then resizing again once you've used all 470 that you've asked for, might as well just ask for 512 to begin with.

kwatford
I disagree with this answer. I'm not sure it answers 'why not by 3 or 4 or 5 growth rate'. It answers a slightly different question (why allocate memory on powers-of-two boundaries?).
Jeremy Powell
It's certainly not the one I would have picked. I thought of it more as supplementary answer. In addition to the other well-explained reasons about growth rate, you're wasting resources if the new array isn't a power of two. So considering the other arguments as to why a larger multiplier wouldn't be good, the only power of two that fits well is 2. It does presuppose that the initial size was also a power of two, of course, but I think most vector classes try to arrange for that.
kwatford
Right. 'disagree' was probably a bit strong :) Also, you could definitely engineer an algorithm to get you roughly 1.5 growth that still makes sure you're word-aligned. If a byte array is 64 bytes in length, you could definitely add 32 bytes to it and still keep the word-alignment.
Jeremy Powell
An array of a power of two number of elements in Java or C# will be an object with a header and length, so the worst size for the memory manager - a power of two plus a few bytes. Same is true in C or C++ if malloc uses a header.
Pete Kirkham
+1  A: 

Personally, I think its an arbitriary choice. We could use base e instead of base 2 (instead of doubling just multiple size by (1+e).)

If you are going to be adding large amounts of variables to the vector then it would be advantageous to have a high base (to reduce the amnt of copying you will be doing.) On the flip side if you need to be storing only a few members on avg, then a low base will be fine and reduce the amount of overhead, hence speeding things up.

Base 2 is a compromise.

ldog
+3  A: 

Any multiple is a compromise. Make it too big and you waste too much memory. Make it too small and you waste much time for reallocations and copying. I guess that doubling is there because it works and is very easy to implement. I also saw a proprietary STL-like library that uses 1.5 as multiplier for the same - I guess its developers considered doubling wasting too much memory.

sharptooth
+9  A: 
Pete Kirkham
A: 

Can anyone help me on how would I calculate the Big-O for the growth of the Vector?

Giusepe
This should be a separate question, You should also give more details on what you tried and where you need help.
sth