views:

166

answers:

5

While writing a post about project euler's 14th problem I ran into a difference in behaviour between VC9 and VC10.

The following code runs OK in VC9 but in VC10 std::unordered_map throws a bad_alloc exception. The strange thing is that if I recover from the exception future allocations will succeed (the size of the container continues to grow). Also if I use boost::unordered_map it works fine in both compilers.

Regarding the actual memory usage, I'm running on a machine with 4GB RAM, (1.7 in use) the VC9 version gets up to ~810MB of memory before completing the task and the VC10 one crashes at ~658MB.

Is this a bug in VC10? I'm running on the same machine what else could cause memory to consistently run out in one version and not in the other when the amount of work done is identical?

<edit>
Some more information: The first time the exception takes place is when calculating 7,718,688 with a stack depth of 1 (no recursion just main->length). After that it seems to happen for each number that is added to the cache. The cache had 16,777,217 elements in it before the exception happened (according to cache.size()). The interesting thing is that even when insert fails the cache size grows by one so it appears that it doesn't supply the strong exception guarantee (in violation of §23.2.1.11).
</edit>

Code follows:

#include <iostream>
#include <unordered_map>

typedef std::unordered_map<_int64, int> cache_type;

_int64 collatz(_int64 i)
{
    return (i&1)? i*3+1 : i/2;
}

int length(_int64 n, cache_type& cache)
{
    if (n == 1)
        return 1;

    cache_type::iterator found = cache.find(n);
    if (found != cache.end())
        return found->second;
    int len = length(collatz(n), cache) + 1; 
    cache.insert(std::make_pair(n, len)); // this sometimes throws
    return len;
}

int main(int argc, char** argv)
{
    const int limit = 10000000;
    cache_type cache;
    std::pair<int, int> max = std::make_pair(0, 0);
    for (int i = 2; i <= limit; ++i) {
        int len = length(i, cache);
        if (len > max.second)
            max = std::make_pair(i, len);
    }

    std::cout<< "Number with longest orbit is " << max.first 
        << " with a lenght of " << max.second 
        << " cache size is " << cache.size() << std::endl;
}

<edit>
Also can anyone reproduce this behaviour, at one time it disappeared (and re-appeared) so there may be something special about my configuration.
</edit>

A: 

Does _int64 have alignment requirements that the map may not be honoring in allocation?

Try using long long int instead and see if the behavior changes.

holtavolt
Initially I used `long long` (and had the same problem) but changed it to `_int64` for readability.
Motti
Interesting problem, and certainly seems like a VC10 regression. All this code is in the same compilation unit (no DLL boundaries?)It does sound like a heap allocation failure, but just to rule it out, can you crank up the stack reserve size in the link settings?That is a default that may have changed (or stack behavior may be different with new VC10 features)Do you know the make_pair or insert is throwing (which one?)Good luck!
holtavolt
The code is as you see it, one exe no DLL's no different translation units for the linker. The exception takes place in `insert` which makes sense since `make_pair` doesn't allocate any memory. As for changing the stack reserve size, I tried that and got strange results. The exception went away, I changed it back the the default and it didn't re-appear, then I rebooted the machine and the problem returned (both with the default 1MB stack and 2MB)...
Motti
Wow - strange stuff. Only other things I'd try would be some of the heap allocation flags, and/or try a 64-bit build if it's currently 32. Sounds like an MS bug based on what you've seen so far, so you might check the msdn boards.
holtavolt
It would be better to include <cstdint> and use int64_t because _int64 is Microsoft-only but int64_t is standard.
Neil Mayhew
A: 

You're blowing the stack in the deeply recursive call to length().

John
I don't think you're right, I'm getting a `bad_alloc` exception and the stack depth is the same for VC10 and VC9
Motti
Also the same code with `boost::unordered_map` and `std::map` works fine with the same stack depth.
Motti
You're right. The recursion depth maxes out at about 400 for the given input. The cache size is about 22 million items. boost::unordered_map appears to have an overhead of roughly 16 bytes per item for this (28 bytes per item total), so you're looking at a cache footprint of about 608 MB.Would you abort() or break at the point of the bad_alloc, and tell us what the cache size, stack size, stack depth, heap usage, etc., are?
John
Hi John I've updated my question with some more information, BTW when you comment on your own answer I don't get notified unless you put my handle in the comment (@Motti).
Motti
+1  A: 

Inserting a single element could result in a large memory allocation if the map's hash table needs to be resized. The map seems to be about 0.5GB at the end of the run. (See my comment above.)

There is presumably some heuristic being used to decide how much to expand the hash table when it needs to grow, and this could conceivably be to double it each time. This would therefore use ~1.5GB for old + new data while the hash table is being copied.

It's therefore possible your program is hitting a limit on process memory size. (See comment again.) If so, it's possible that VC10 takes slightly more memory overall than VC9, and that slightly different amounts of memory get allocated on different runs or builds of the program, so that VC10 hits the limit sometimes while VC9 doesn't ever hit it.

Neil Mayhew
Are you sure that resizing an unordered_map should mean copying all the memory? AFAIK an unordered_map is **not** contiguous and resizing it should not be dependant on the amount of data currently in the map.
Motti
Yes, this occurred to me too, some time after I'd written this answer. However, I've checked the documentation for unordered_map on MSDN and there is still a hash table involved. It groups the map elements into buckets, and each bucket is effectively a std::list. The buckets are stored in a hash table. So the unordered_map implementation will need to grow that table as the map grows, although the amount of memory reallocation won't be as much as I said. There are several methods available for checking things like the number of buckets, so you could use these to investigate the storage pattern.
Neil Mayhew
You could define a custom allocator and use it with cache_type to trace the memory allocations and deallocations. In particular, you could find out the size of the memory allocation that fails. To avoid changing the problem behaviour you could implement the custom allocator as a decorator for the standard allocator.
Neil Mayhew
It would also be helpful to know the value of n when cache.insert throws, so see how far through the problem the program is when it fails. If it's very close to the end this might help to explain why VC9 makes it all the way and VC10 doesn't.
Neil Mayhew
I've updated my question with some more information, BTW when you comment on your own answer I don't get notified unless you put my handle in the comment (@Motti).
Motti
@Motti I tried the custom-allocator approach (on Linux) and could see a very obvious pattern. By the end, there had been 21,730,848 16-byte allocations (the hash-nodes) and a series of single larger allocations doubling in size and ending up at 145,893,776 bytes (the bucket table). This still only gives a max of around ~0.5GB. However, based on what @ngoozeff says about _SECURE_SCL and my own experience with it, it puts a LOT more debugging information into memory blocks, so this could easily cause memory usage to balloon in a program like this. I'll post the custom allocator code soon.
Neil Mayhew
A: 

1 - Check EventLog to see if there are any events talking about a process going over it's allowed quota.

2 - If you are on 32-bit OS try starting it with 3GB for user space.

3 - Look to see if you have different allocators available

4 - Diff unordered_map in 9.0 and 10.0 and it's in-lined file on on off-chance that there's an artificial size limiter added ("security features" :-). It would most probably be in a macro with different values for x86 and x64 build.

5 - Try to put a light wrapper around the allocator and just print sizes for each allocation. That will also tell you if it's really allocator that's throwing or something before it.

6 - If it's allocator throwing look at actual WinNT API calls made from it (and diff with 9.0 again)

7 - Try pre-allocating huge block (say 1 GB).

ZXX
+1  A: 

It might be incidental, but changing the value of _SECURE_SCL causes the behavoir you describe.

i.e Compiling with:

cl /EHa /MD /D_SECURE_SCL=1 /Ox /c t1.cpp
link /LIBPATH:"c:/Program Files/Microsoft Visual Studio 10.0/VC/lib" /LIBPATH:"C:/Program Files/Microsoft SDKs/Windows/v7.0A/Lib" t1.obj

crashes, but the same commands with _SECURE_SCL=0 runs to completion on my XP 32bit machine. The msdn page for _SECURE_SCL says it's enabled for debug builds, but not release, which might be important if you're building under the IDE.

ngoozeff
I'll look into it although I've been building with the default release flags from the IDE.
Motti
_SECURE_SCL adds a lot of debugging checks to STL code, so it would cause memory usage to increase considerably. (See other comments.)
Neil Mayhew