Why is vector array doubled?

+4 A:

Exponentially doubling the size of the array(or string) is a good compromise between having enough cells in the array and wasting too much memory.

Say we start off with 10 elements:

1 - 10
2 - 20
3 - 40
4 - 80
5 - 160

When we triple the size, we grow too fast

1 - 10
2 - 30
3 - 90
4 - 270
5 - 810

In practice you would grow maybe 10 or 12 times. If you triple you would maybe do it 7 or 8 times - the runtime hit for reallocating is this few times is sufficiently small to worry about but you are more likely to completely overshoot the required size.

Igor Zevaka 2009-09-15 02:27:43

Ok, but then you could argue to say that the vector could just expand to one more element, or expand by half as much elements.Is there any particular reason for doubling it?

Absolute0 2009-09-15 02:30:17

If your current size is 1,000,000 cells then doubling and copying seems very expensive.

Absolute0 2009-09-15 02:33:50

When you double, you are guaranteed to waste at **most** the amount of memory you want to use. The point of growing exponentially is to not have to grow at all as you are around about the target size.

Igor Zevaka 2009-09-15 02:37:25

Guaranteed?Rarely anything is guaranteed in coding. :)Your next input size could be a n! of what you currently have...

Absolute0 2009-09-15 02:37:47

Actually, things can be guaranteed by requirements. If your requirement is to process input from files, and each file contains 50 records, then you could add 50 new elements just before you start reading a file, for example...other than that, it's all about tradeoffs between memory and performance. Do you need to be sure that you always have sufficient room without needing to copy? Or do you need a smaller memory footprint and can afford the copying?

Thomas Owens 2009-09-15 02:40:16

I'll rephrase, at any given time for a vector of size N the most physical memory it uses is 2*N.

Igor Zevaka 2009-09-15 02:40:21

@Ramin: and the reason not to add just one more element, is that then adding N elements would require time proportional to N^2, since you'd resize every time. With exponential growth, it requires time proportion to N.

Steve Jessop 2009-09-15 11:15:04

_ahem_ ... 90*3 != 180

chrispy 2010-06-28 19:27:16

Too true. I've been outsourcing my maths to computers for too long :)

Igor Zevaka 2010-06-28 20:40:21

+2 A:

If you are asking about the Java-specific implementation of Vector and ArrayList, then it's not necessarily doubled on each expansion.

From the Javadoc for Vector:

Each vector tries to optimize storage management by maintaining a capacity and a capacityIncrement. The capacity is always at least as large as the vector size; it is usually larger because as components are added to the vector, the vector's storage increases in chunks the size of capacityIncrement. An application can increase the capacity of a vector before inserting a large number of components; this reduces the amount of incremental reallocation.

One of the constructors for Vector allows you to specify the initial size and capacity increment for the Vector. The Vector class also provides the ensureCapacity(int minCapacity) and setSize(int newSize), for manual adjustments of the minimum size of the Vector and to resize the Vector on your own.

The ArrayList class is very similar:

Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.

An application can increase the capacity of an ArrayList instance before adding a large number of elements using the ensureCapacity operation. This may reduce the amount of incremental reallocation.

If you are asking about the general implementation of a vector, than the choice of increase in size and by how much is a trade-off. Generally, vectors are backed by arrays. Arrays are of a fixed size. To resize a vector because it's full means that you have to copy all elements of an array into a new, larger array. If you make your new array too large, then you have allocated memory that you will never use. If it's too small, it might take too long to copy the elements from the old array into the new, larger array - an operation that you don't want to perform very often.

Thomas Owens 2009-09-15 02:38:20

A:

There is no performance reason for doubling vs tripling or quadrupling as the all have the same big O performance profiles. However in absolute terms doubling will tend to be more space efficient in the normal scenario.

Joshua 2009-09-15 02:42:39

+4 A:

If you were to allocate an unusual-sized block of memory, then when that block gets deallocated (either because you're resizing it or it gets GC'd) there would be an unusual-sized hole in memory which could cause headaches for the memory manager. So it's usually preferred to allocate memory in powers of two. In some cases the underlying memory manager will only give you blocks of certain sizes, and if you request a weird size it will round up to the next larger size. So rather than asking for 470 units, getting back 512 anyway, and then resizing again once you've used all 470 that you've asked for, might as well just ask for 512 to begin with.

kwatford 2009-09-15 02:51:41

I disagree with this answer. I'm not sure it answers 'why not by 3 or 4 or 5 growth rate'. It answers a slightly different question (why allocate memory on powers-of-two boundaries?).

Jeremy Powell 2009-09-15 14:43:01

It's certainly not the one I would have picked. I thought of it more as supplementary answer. In addition to the other well-explained reasons about growth rate, you're wasting resources if the new array isn't a power of two. So considering the other arguments as to why a larger multiplier wouldn't be good, the only power of two that fits well is 2. It does presuppose that the initial size was also a power of two, of course, but I think most vector classes try to arrange for that.

kwatford 2009-09-15 15:18:18

Right. 'disagree' was probably a bit strong :) Also, you could definitely engineer an algorithm to get you roughly 1.5 growth that still makes sure you're word-aligned. If a byte array is 64 bytes in length, you could definitely add 32 bytes to it and still keep the word-alignment.

Jeremy Powell 2009-09-15 15:27:54

An array of a power of two number of elements in Java or C# will be an object with a header and length, so the worst size for the memory manager - a power of two plus a few bytes. Same is true in C or C++ if malloc uses a header.

Pete Kirkham 2009-10-30 09:15:00

+1 A:

Personally, I think its an arbitriary choice. We could use base e instead of base 2 (instead of doubling just multiple size by (1+e).)

If you are going to be adding large amounts of variables to the vector then it would be advantageous to have a high base (to reduce the amnt of copying you will be doing.) On the flip side if you need to be storing only a few members on avg, then a low base will be fine and reduce the amount of overhead, hence speeding things up.

Base 2 is a compromise.

ldog 2009-09-15 05:50:44

+3 A:

Any multiple is a compromise. Make it too big and you waste too much memory. Make it too small and you waste much time for reallocations and copying. I guess that doubling is there because it works and is very easy to implement. I also saw a proprietary STL-like library that uses 1.5 as multiplier for the same - I guess its developers considered doubling wasting too much memory.

sharptooth 2009-09-15 05:58:39

+9 A:

Pete Kirkham 2009-09-15 09:12:20

A:

Can anyone help me on how would I calculate the Big-O for the growth of the Vector?

Giusepe 2009-10-28 19:28:00

This should be a separate question, You should also give more details on what you tried and where you need help.

sth 2009-10-28 23:32:20

ansaurus

tags:

views:

answers:

Why is vector array doubled?

related questions