views:

138

answers:

7

I recent wrote this post:
http://stackoverflow.com/questions/3737138/how-best-to-store-very-large-2d-list-of-floats-in-c-error-handling

Some suggested that I implemented my 2D list-like structure of floats as a vector, others said a deque.

From what I gather vector requires continuous memory, but is hence more efficient. Obviously this would be desirable if possible.

Thus my question is what's a good rule of how long a basic structure can be in terms of...

1. float
2. int

...before you should switch from a vector to a deque to avoid memory problems?

e.g. I'm looking for answer like "At around 4 million floats or 8 million ints, you should switch..." ...if possible.

+4  A: 

Well, here are two opinions. The C++ standard says (23.1.1/2):

vector is the type of sequence that should be used by default.

list should be used when there are frequent insertions and deletions from the middle of the sequence.

deque is the data structure of choice when most insertions and deletions take place at the beginning or at the end of the sequence.

Herb Sutter argues the following (the article contains his rationale and a performance analysis):

I'd like to present an amiably dissenting point of view: I recommend that you consider preferring deque by default instead of vector, especially when the contained type is a class or struct and not a builtin type, unless you really need the container's memory to be contiguous.

James McNellis
But that doesn't really answer my question... if I understood the previously commenters correctly, at a certain size limit, a vector would yield allocation errors/failures, while a deque would function fine by using non-contiguous memory. I'm using a built-in type in this case (float), but if there are size limitations attached to the vector, I want to have a feel for what they are....
Jason R. Mick
@Jason: That would be very application-specific. It depends on how your allocator performs and how you have allocated memory in your program. If you address space is heavily fragmented or you have already allocated some number of large blocks, allocation may fail with a much smaller block. I tend to agree with Herb Sutter's view that you should at least consider just using `deque` as a default.
James McNellis
@Jason: then the obvious answer is "when your vector operation returns a `std::bad_alloc`, it's time to fall back to a deque ;)
jalf
@jalf: How about: "when you profile and see deque operations taking too much time, switch to vector"?
jamesdlin
@jamesdlin: Yeah. That advice works both ways, of course: you can profile and find that the vector reallocations are killing your performance and decide to switch to using a deque instead.
James McNellis
A: 

Again, there is no size limit above which deque is or not better than vector. Memory fragmentation implications are pretty much the same in either case, except when you have already done a huge load of allocations/deallocations and there is not enough contiguous space left for a big vector. But this case is very rare. Remember that memory space is per process (google for virtual memory). And you can remedy it by allocating the memory for the vector (by the reserve method) before the cluttering takes place.

The tradeoff is in term of what you want to do with it. If the structure is basically immutable and you only want to access it / overwrite it by index access, go for vector.

Deque is when you need to do insertions either at the end, the beginning or in the middle, something vector cannot handle naturally (except for inserting at the end).

Herb Sutter's articles are in general of great quality, but you'll notice that when you do "number crunching" in C++, most of the stuff you're taught in "general C++" books must be taken with an extra word of caution. The poor indexing performance you experience with deques is perhaps important for your application. In this case, don't use deque.

Alexandre C.
There are "size limits" (relating to performance and to allocation failure); they just involve a lot of factors, some of which may depend on system load...
SamB
+1  A: 

If you need insertions at the beginning, then go with deque.

Otherwise, I always like to point to this article on vector vs. deque (in addition to those linked by James McNellis here). Assuming an implementation of deque that uses page-based allocation, this article has good comparisons of allocation time (& deallocation time) for vector with & without reserve() vs. deque. Basically, using reserve() makes vector allocation time very similar to deque. Informative, and useful if you can guess the right value to reserve ahead of time.

msbmsb
The same tests on something different than MS implementation and Windows platform are likely to produce totally different results. Complexity requirements from the standard are 1) asymptotic 2) up to constants. Such articles are terribly misleading.
Alexandre C.
@Alexandre: Any C++ question is likely to have very different results on different platforms and with different compilers. An article like that at least makes you think about what is happening with the classes.
msbmsb
@msbmsb: ...which is really really implementation dependant! Doubly linked lists, circular buffers and linked list of buffers are all standard-compliant candidates for deque in term of complexity requirements...
Alexandre C.
Fine, added an assumption "note".
msbmsb
@Alexandre: Neither a doubly linked list nor a linked list of buffers would give you amortized constant time random element access, which is required.
James McNellis
@James: for a linked list of buffers, you can use an extra array storing indices and pointers to buffers (afair it is what is done in the GNU libc++ implementation of deque). This makes retrieval a O(log n) process. For doubly linked lists you're right.
Alexandre C.
+1  A: 

There are so many factors to consider that it's impossible to give a clear answer. The amount of memory on the machine, how fragmented it is, how fragmented it may become, etc. My suggestion is to just choose one and use it. If it causes problems switch. Chances are you aren't going to hit those edge cases anyway.

If you are truly worried, then you could probably implement a sort of pseudo PIMPL:

template<typename T>
class VectorDeque
{
private:
  enum TYPE { NONE, DEQUE, VECTOR };
  std::deque<T> m_d;
  std::vector<T> m_v;
  TYPE m_type;
  ...
public:
  void resize(size_t n)
  {
    switch(m_type)
    {
      case NONE:
      try
      {
        m_v.resize(n);
        m_type = VECTOR;
      }
      catch(std::bad_alloc &ba)
      {
        m_d.resize(n);
        m_type = DEQUE;
      }
      break;
    }
  }
};

But this seems like total overkill. Have you done any benchmarking or tests? Do you have any reason to believe that memory allocations will fail or that deques will be too slow?

Niki Yoshiuchi
A: 

Well, regarding memory, I can share some experience that may help you decide when contiguous memory blocks (malloc or std::vector) may become too large:

The application I work with does record measurement data, mostly 4byte float, and for this it allocates internal buffers to store the data. These buffers heavily vary in size, but the typical range may be say, several dozen of 1-10MB and a very few of >100MB. The buffers are allways allocated with calloc, i.e. one big chunk of memory. If a buffer-allocation fails, an error is logged and the user has the choice to try again.

Buffer sizes: Say you want to record 1000 channels at 100Hz for 10 Minutes: 4byte x 1000 x 100 x 60x10 == 228 MB (approx.) ... or 100 channels at 10Hz for 12 hours == 41 MB

We (nearly) never had any problems allocating 40MB buffers (and that's about 10 millon floats) and the 200-300 MB buffers fail from time to time -- all this on normal WinXP/32bit boxes with 4GB RAM.

Martin
A: 

Given that you don't insert after creation, you should probably either use plain old std::vector, or if fragmentation really does become an issue, a custom vector-like Sequence implemented as a vector or array of pointers to fixed-size arrays.

SamB
+1  A: 

You switch after testing and profiling indicate that one is worse than the other for your application. There is no "around N floats or M ints" universal answer.

Steve M