Why does the classic implementation of Vector (ArrayList for Java people) double its internal array size on each expansion instead of tripling or quadrupling it?
Exponentially doubling the size of the array(or string) is a good compromise between having enough cells in the array and wasting too much memory.
Say we start off with 10 elements:
1 - 10
2 - 20
3 - 40
4 - 80
5 - 160
When we triple the size, we grow too fast
1 - 10
2 - 30
3 - 90
4 - 270
5 - 810
In practice you would grow maybe 10 or 12 times. If you triple you would maybe do it 7 or 8 times - the runtime hit for reallocating is this few times is sufficiently small to worry about but you are more likely to completely overshoot the required size.
If you are asking about the Java-specific implementation of Vector and ArrayList, then it's not necessarily doubled on each expansion.
From the Javadoc for Vector:
Each vector tries to optimize storage management by maintaining a
capacity
and acapacityIncrement
. The capacity is always at least as large as the vector size; it is usually larger because as components are added to the vector, the vector's storage increases in chunks the size ofcapacityIncrement
. An application can increase the capacity of a vector before inserting a large number of components; this reduces the amount of incremental reallocation.
One of the constructors for Vector allows you to specify the initial size and capacity increment for the Vector. The Vector class also provides the ensureCapacity(int minCapacity)
and setSize(int newSize)
, for manual adjustments of the minimum size of the Vector and to resize the Vector on your own.
The ArrayList class is very similar:
Each
ArrayList
instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.An application can increase the capacity of an
ArrayList
instance before adding a large number of elements using the ensureCapacity operation. This may reduce the amount of incremental reallocation.
If you are asking about the general implementation of a vector, than the choice of increase in size and by how much is a trade-off. Generally, vectors are backed by arrays. Arrays are of a fixed size. To resize a vector because it's full means that you have to copy all elements of an array into a new, larger array. If you make your new array too large, then you have allocated memory that you will never use. If it's too small, it might take too long to copy the elements from the old array into the new, larger array - an operation that you don't want to perform very often.
There is no performance reason for doubling vs tripling or quadrupling as the all have the same big O performance profiles. However in absolute terms doubling will tend to be more space efficient in the normal scenario.
If you were to allocate an unusual-sized block of memory, then when that block gets deallocated (either because you're resizing it or it gets GC'd) there would be an unusual-sized hole in memory which could cause headaches for the memory manager. So it's usually preferred to allocate memory in powers of two. In some cases the underlying memory manager will only give you blocks of certain sizes, and if you request a weird size it will round up to the next larger size. So rather than asking for 470 units, getting back 512 anyway, and then resizing again once you've used all 470 that you've asked for, might as well just ask for 512 to begin with.
Personally, I think its an arbitriary choice. We could use base e instead of base 2 (instead of doubling just multiple size by (1+e).)
If you are going to be adding large amounts of variables to the vector then it would be advantageous to have a high base (to reduce the amnt of copying you will be doing.) On the flip side if you need to be storing only a few members on avg, then a low base will be fine and reduce the amount of overhead, hence speeding things up.
Base 2 is a compromise.
Any multiple is a compromise. Make it too big and you waste too much memory. Make it too small and you waste much time for reallocations and copying. I guess that doubling is there because it works and is very easy to implement. I also saw a proprietary STL-like library that uses 1.5 as multiplier for the same - I guess its developers considered doubling wasting too much memory.