views:

250

answers:

3

I was just wondering how exactly memory works such that a language standard (such as C++'s ISO/ANSI standard) can guarantee that any data structure (even an array) will be contiguous.

I don't even know how to write a data structure using continguous memory, but could you please give me a short example of how designers can do this?

For instance, assuming std::vector from C++ allocates all of its memory at runtime, how would it know that the memory slots ahead of the current allocated memory is not in use (and thusly, free for use by the vector)? Do vectors just look quite far ahead and hope the user doesn't try and push_back so many objects that it can no longer store it in a contiguous memory block? Or does the operating system move around memory at will to keep this from becoming a problem (no idea how this would work)?

+3  A: 

The standards may demand that a compliant implementation behave in a certain way -- they can "guarantee" only that compliant implementations behave that way, because, by definition, an implementation that violates a requirement in the standard is NOT compliant.

How a typical implementation of std::vector complies with the contiguous-memory requirement: it allocates a certain amount of space ("capacity"), and if the current size ever catches up with the capacity, it reallocates the space (which amounts to doing a completely new allocation, typically some fixed multiple of the current capacity, copying the current contents to the new space, and releasing the previously obtained space).

Alex Martelli
Wouldn't that be inefficient, having to copy all the data to a new memory block?
Hooked
Not copying the data isn't an option - if the memory must be contiguous, and there is not enough space after the current version of the memory block to add the new block, then the block must be copied. (If there is enough space after the current block, the memory allocation system won't do the copy - it will just extend the existing block.)
Jonathan Leffler
How does the implementation find a new block of memory sizable enough? Would it just have a pointer to a block of memory that is sizeof(vec) and then copy all the memory in order from there?
Hooked
`malloc()` always returns contiguous memory. If a large enough contiguous space can't be found, it will simply fail.
Chuck
@Hooked: yes, it's inefficient. Two things can mitigate this, (1) when vector reallocates, it increases its capacity exponentially, so that it only takes log N resize ops to reach size N, and (2) use a deque instead, which does not require contiguous memory.
Steve Jessop
@Hooked: There is inefficient, and there is *inefficient*. With the "always grow by a factor of 2 (or more) scheme", the amortized coping cost scales linearly with the number of elements, so it is not bad. The thing is: you can't make any real-time guarantees because any given addition might trigger a big copy.
dmckee
+1  A: 

One thing to remember is that all modern operating systems have something called Virtual Memory. This allows any program the ability to assume that it has complete access to the largest possible amount of memory for that architecture, regardless of other processes, or even the amount of physical RAM in the computer. So when compiling, the compiler can automatically assign enough space on the stack (if the size needed is known at compile-time), or it can write the assembly to dynamically allocate memory on the heap as if the entire memory space were available. It's a level of abstraction that makes things incredibly easier.

What the OS does is take care of swapping the pages in and out of physical memory when they are needed for the various running programs. That way, no program needs to worry about the others, but you can still have simultaneously running programs.

I am of course vastly simplifying what is really going on, but I think this at least answers your questions on how the compiler can tell what memory is free; because it can assume that all memory starts out empty.

Sean Nyman
+2  A: 

Your question seems to be about how to understand the concept of memory allocation from a beginners point of view. Let me try to explain what is going on in a very simplified manner. As an example we can think of a C++ program that adds a lot of elements to a std::vector.

When the program starts the C++ runtime will call the operating system to allocate some memory. This piece of memory is called the heap and it is used when dynamic memory is required by the C++ program. Initially the heap is mostly unused, but calls to new and malloc will carve out blocks of memory on the heap. Internally the heap uses some bookkeeping information to keep track of the used and free ares of the heap.

Exactly how std::vector behaves internally depends on the implementation, but in general it will allocate a buffer for the elements of the vector on the heap. This buffer is big enough to accommodate all the elements in the vector, but most of the time it has some free space at the end. Here is a buffer that stores 5 elements and has space enough for 8 elements. The buffer is located at address 1000 on the heap.

1000: X X X X X _ _ _

The std::vector keeps track of both the number of elements in the vector (5) and the size (8) and location (1000) of the buffer.

Here is the buffer after push_back is called to add a new element to the vector:

1000: X X X X X X _ _

That can be done two more times until all space has been used in the buffer.

1000: X X X X X X X X

But what happens if push_back is called once more? The vector has to increase the size of the buffer. The buffer is allocated on the heap and if the area right after the buffer is unused it may actually be possible to simply grow the buffer. However, most of the time the memory has been allocated to some other object. This is something the heap keeps track of. For the vector to be able to grow the buffer it has to allocate a completely new buffer with an increased size. Many implementations will simply double the size of the buffer. Here is the new buffer that now stores 9 elements and has room for 16 elements. The new buffer is allocated at address 2000 on the heap:

2000: X X X X X X X X X _ _ _ _ _ _ _

The contents of the old buffer is copied to the new buffer, and this operation can be costly if the buffer is big.

In case you wonder the heap may also grow while the program is running just as individual blocks allocated on the heap may grow. This will increase the memory consumption of the program. As more and more elements are added to the vector the heap will have to grow until the operating system refuses to increase the size of the heap. When that happens the program will fail with an out of memory condition.

To sum up:

  • The operating system supplies memory for the heap that can grow in size until the limits of the operating system has been reached.
  • The memory allocation routines in C++ can allocate and free blocks of fixed size on the heap.
  • The std::vector will preallocate a buffer to allow the vector to grow, but if the vector grows beyond the size of the buffer it will allocate a new buffer and copy the entire contents of the vector to this new buffer.
Martin Liversage
Excellent, thank you! Is it reasonable to assume that the reason most data structures are NOT contiguous is that it is 1) unnecessary and 2) very costly with many elements?
Hooked
It depends... Arrays and vectors are always contiguous. Otherwise element lookup would be too costly. Lists on the other hand are normally not contiguous. Element insertion is faster than in a vector, but element lookup is slower. When designing data structures you have to consider lookup time, insertion time, sorting time etc. and then decide how you will use memory.
Martin Liversage