views:

350

answers:

4

Following on from a previous question relating to heap usage restrictions, I'm looking for a good standard C++ class for dealing with big arrays of data in a way that is both memory efficient and speed efficient. I had been allocating the array using a single malloc/HealAlloc but after multiple trys using various calls, keep falling foul of heap fragmentation. So the conclusion I've come to, other than porting to 64 bit, is to use a mechanism that allows me to have a large array spanning multiple smaller memory fragments. I don't want an alloc per element as that is very memory inefficient, so the plan is to write a class that overrides the [] operator and select an appropriate element based on the index. Is there already a decent class out there to do this, or am I better off rolling my own?

From my understanding, and some googling, a 32 bit Windows process should theoretically be able address up to 2GB. Now assuming I've 2GB installed, and various other processes and services are hogging about 400MB, how much usable memory do you think my program can reasonably expect to get from the heap?

I'm currently using various flavours of Visual C++.

Edit As per Poita's post, I've tried a std::deque, using the following test on VS2008;

#include <deque>
using namespace std;
struct V    
{
    double  data[11];
};

struct T
{
    long    data[8];    
};


void    dequeTest()
{
    deque<V> VQ;
    deque<T> TQ;

    V defV;
    T defT;

    VQ.resize(4000000,defV);
    TQ.resize(8000000,defT);
}

The total memory for the above data comes out at 608MB, were I to use straight malloc or HeapAlloc, and takes < 1 second. The deque resizes took 950MB originally, and then slowly started dropping back. 15 minutes later, dequeTest() finished, using just 6MB of memory showing for the process which probably was more to do with the run-times. I also tried populating the deque using various push options, but performance was so bad, I had to break out early. I could possibly provide a better allocator than the defualt to get a much better response, but on the face of it deque is not the class for this job. Note this could also relate to the MS VS2008 implementation of deque, as there seems to be alot in this class that is very implementation dependant when it comes to performance.

Time to write my own big array class, I reckon.

Second Edit: Allocating smaller amounts yielded 1.875GB immediately using the following;

#define TenMB 1024*1024*10

void    SmallerAllocs()
{

    size_t Total = 0;
    LPVOID  p[200];
    for (int i = 0; i < 200; i++)
    {
        p[i] = malloc(TenMB);
        if (p[i])
            Total += TenMB; else
            break;
    }
    CString Msg;
    Msg.Format("Allocated %0.3lfGB",Total/(1024.0*1024.0*1024.0));
    AfxMessageBox(Msg,MB_OK);
}

Final edit I have decided to accept Poita's post and the various comments following it, not because I'll be using the deque class directly, but more for the array as a deck of cards notion in the comments that followed. This should be straightforward to implement with O(1) random element access, based on a fixed number of elements per block, which is what i need. Thanks to all for the feedback!

+3  A: 

From the point of view of your program you always have 2GB available on startup, no matter what else is going on in the system. I don't believe Windows provides a way to detect if you have memory being paged out to disk or not. As far as you data structures go, it sounds like you're describing something similar to how a deque is implemented in the STL.

tloach
I think you mean deque (http://www.cplusplus.com/reference/stl/deque/), not dequeue.
Bill
Right you are. Fixed.
tloach
+10  A: 

Have you tried using an std::deque? Unlike a std::vector, which uses one huge heap allocation, deque usually allocates in small chunks, but still provides amortised constant time indexing via operator[].

Peter Alexander
I'll have a look at the deque implementation but I'd be concerned about how small the chunks are. I'm dealing with many millions of relatively small structures, so any implementation that individually allocates memory on a per item basis is liable to be prohibitively memory ineffecient.
Shane MacLaughlin
I don't know what policy it uses for sizing, but it is certainly far more than 1. I doubt the memory inefficiency would be more than 10%, and I would expect it to be <5%.
Peter Alexander
After reading up on std::deque, specifically on using allocators, I can't find anything that states that it will not attempt to allocate all memory in a single contiguous block. See http://www.cplusplus.com/reference/std/memory/allocator/ Put another way, I can't find anything to suggest that adding 1GB of data to a deque is any more likely to succeed than using a HeapAlloc to do the same thing. Do you have any references to suggest that a deque will use multiple heap allocations for very large amounts of data, and if so how does it fragment them?
Shane MacLaughlin
IIRC: deques are implemented in blocks of set size. So if the block size was 100 elements big and you went and allocated 450 elements you would have 5 blocks, which possibly would be contiguous but not necessarily so. The internal representation of each block would be though.
graham.reeds
@Shane, it comes from the fact that deques require O(1) time to do a push_front, as well as doing operator[], and the only way to achieve that is by allocating in blocks.
Peter Alexander
@graham, Herb Sutters argument supports what you are saying; http://www.gotw.ca/gotw/054.htm Another useful post here, http://www.dreamincode.net/forums/index.php?showtopic=35344, which suggests that deque performance in terms of random access isn't that good, which is a big no-no for me.
Shane MacLaughlin
@Shane - Does acceptance of this answer imply that you tried this and it worked out for you after all? Your last comment was rather negative.
T.E.D.
@T.E.D., nope std::deque didn't suit my exact requirements, but the answer and subsequent comments pushed me in the direction I needed to go in. I'm currently implementing a solution as a virtual array represented intenally as a secondary array of large blocks. Getting an element is from a subscript is a simple matter of block no = subscript / number of elements per block, block position = subscript % number of elements per block. deque-like but not a deque, and hopefully very fast and efficient. I'll post it up when finished and tested.
Shane MacLaughlin
Hmmm. It might be worth putting this in an answer yourself. I've done that for my own questions in the past. I'm pretty sure its kosher.
T.E.D.
+4  A: 

Exactly how sparse is this array? If there are large amounts of empty (unused) space in it, you may want to take another approach. The answer to this question suggests an stl map.

If it isn't sparse (as mentioned in the comments), one thing you might look into since you are running on Windows is using a memory-mapped file. Although your OS may be 32-bit, your file system is not. This does of course mean there will be swapping going on though, which is liable to be quite a bit slower than if you could really put the whole darn thing in RAM.

Also, you really ought to consider knocking the system's RAM up to the max (3GB on 32-bit Windows I believe) to see if that fixes it for you. That should only cost you about $100, and you are spending way more than that in man-hours just worrying about this.

T.E.D.
No unused space at all, unfortunately. The data in question is a TIN network comprising of 3d coordinates and triangles linking them together, jointly representing a large irregular surface. Thanks for the link in any case, may well be useful elsewhere.
Shane MacLaughlin
Having 2GB installed != 2GB physical ram available. Even with 4GB installed you may not have more than 2.5gb installed if you have a large memory graphics card (if you re doing 3D work then this is a distinct possibility). Personally make sure you have 4+GB installed and a 64bit OS.
graham.reeds
@graham - He already said he was using a 32-bit OS, and was asking what he could do short of upgrading to 64bit. That's why I suggested going up to 3GB. That being said, I agree that using a 64-bit OS would probably be a good (and relatively cheap) solution.
T.E.D.
Porting to 64 bit is certainly on the cards for later this year, but is only part of the solution. We have a lot of clients with 32bit laptops out there that aren't likely to upgrade any time soon, and a good 32 bit solution that solves a problem with their current hardware has real value.
Shane MacLaughlin
+1  A: 

std::deque does exactly what you're describing, but usually at the granularity of the OS page size (that is, the chunks it allocates are usually 4 kB).

If you're unhappy with deque's default performance, you might be able to write a custom allocator that grabs bigger chunks--that is, gets 1 MB or more at a time.

As others have said, your process's virtual address space is completely independent of all other processes, so you can address 2GB no matter what else is going on in your system. The OS will swap your memory pages to/from disk as necessary to fit the constraints of the amount of installed memory and all the processes contending for it. This will happen at the 4 kB page size, independent of how big your chunks are.

Drew Hall