views:

732

answers:

15

I have a C++ program which, during execution, will allocate about 3-8Gb of memory to store a hash table (I use tr1/unordered_map) and various other data structures.

However, at the end of execution, there will be a long pause before returning to shell.

For example, at the very end of my main function I have

std::cout << "End of execution" << endl;

But the execution of my program will go something like

$ ./program
do stuff...
End of execution
[long pause of maybe 2 min]
$ -- returns to shell

Is this expected behavior or am I doing something wrong?

I'm guessing that the program is deallocating the memory at the end. But, commercial applications which use large amounts of memory (such as photoshop) do not exhibit this pause when you close the application.

Please advise :)

Edit: The biggest data structure is an unordered_map keyed with a string and stores a list of integers.

I am using g++ -O2 on linux, the computer I am using has 128GB of memory (with most of that free). There are a few giant objects

Solution: I ended up getting rid of the hashtable since it was almost full anyways. This solved my problem.

+7  A: 

Yes the deallocation of memory can take some time, and also possibly you have code executing like destructors being called. Photoshop does not use 3-8GB of memory.

Also you should perhaps add profiling to your application to confirm it is the deallocation of memory and not something else.

Brian R. Bondy
+3  A: 

Normally, deallocating memory as a process ends is not taken care of as part of the process, but rather as an operating system cleanup function. You might try something like valgrind to make sure your memory is being dealt with properly. However, the compiler also does certain things to set up and tear down your program, so some sort of performance profiling, or using a debugger to step through what is taking place at teardown time might be useful.

WhirlWind
Thanks, I will try to see what shows up in valgrind.
jm1234567890
+10  A: 

If the data structures are sufficiently complicated when your program finishes, freeing them might actually take a long time.

If your program actually must create such complicated structures (do some memory profiling to make sure), there probably is no clean way around this.

You can short cut that freeing of memory by a dirty hack - at least on those operating systems where all memory allocated by a process is automatically freed when the process terminates.

You would do that by directly calling the libc's exit(3) function or the operating system's _exit(2). However, I would be very careful about verifying this does not short-circuit any other (important) cleanups some C++ destructor code might be doing. And what this does or does not do is highly system dependent (operating system, compiler, libc, the APIs you were using, ...).

ndim
What could go wrong with exit(2)? Assuming I don't have any special destuctors. Could that potentially leave memory that cannot be allocated again?
jm1234567890
A call to exit() kills your app would free its memory, but it might not free open file handles, forcing you to reboot the box. I've seen that happened on Solaris.Also, it makes no sense to assume no special destructors. It has to be your destructors. I'm almost sure of it.
luis.espinal
exit() will take care freeing memory and file descriptors, but if you have destructors that do things like remove temporary files then that may not happen if you call exit().
mark4o
I think in some implementation exit() will call global destructors. You may need to use _exit() instead.
Zan Lynx
Be sure your files are closed first, to flush unwritten I/O.
Potatoswatter
exit() will call destructors for global and static objects, but not destructors for objects on the stack or heap. _exit() will call no destructors at all. Either way, the OS will ensure that all memory is reclaimed.
mark4o
+4  A: 

Freeing memory may well take time - data structures are being updated. How much time depends on the allocator being used.

Also there might be more than just memory deallocation going on - if destructors are being executed, there may be a lot more than that going on.

2 minutes does sound like a lot of time though - you might want to step through the clean up code in a debugger (or use a profiler if that's more convenient) to see what's actually taking all the time.

Michael Burr
+2  A: 

Sorry, but this is a terrible question. You need to show the source code showing the specific algorithms and data structures that you are using.

It could be de-allocating, but that's just a wild guess. What are your destructors doing? Maybe is paging like crazy. Just because your application allocates X amount of memory, that doesn't mean it will get it. Most likely it will be paging off virtual memory. Depending on how the specifics of your application and OS, you might be doing a lot of page faults.

In such cases, it might help to run iostat and vmstat on the background to see what the heck is going on. If you see a lot of I/O that's a sure sign you are page faulting. I/O operations will always be more expensive that memory ops.

I would be very surprised if indeed all that lapsed time at the end is purely due to de-allocation.

Run vmstat and iostat as soon as you get the "ending" message, and look for any indications of I/O going bananas.

luis.espinal
Thanks! I will also have a look at that. I have updated the question with the specific data structure.
jm1234567890
+5  A: 

I can't imagine how you'd use enough memory for it to matter, but one way I sped up a program was to use boost::object_pool to allocate memory for a binary tree. The major benefit for me was that I could just put the object pool as a member variable of the tree, and when the tree went out of scope or was deleted, the object pool would be deleted all at once (letting me not have to use a recursive deconstructor for the nodes). object_pool does call all of its objects decontructors at exit though. I'm not sure if it handles empty decontructors in a special way or not.

If you don't need your allocator to call a constructor, you can also use boost::pool, which I think may deallocate faster because it doesn't have to call deconstructors at all and just deleted the chunk of memory in one free().

Brendan Long
+2  A: 

when your program exits the destructors of all the global objects are called. if one of them takes a long time, you will see this behavior.

look for global objects and investigate their destructors.

Omry
+3  A: 

The time is probably not entirely wasted deallocating memory, but calling all the destructors. You can provide your own allocator that does not call the destructor (if the object in the map doesn't need to be destructed, but only deallocated).

Also take a look at this other question: http://stackoverflow.com/questions/2572533/c-stl-conforming-allocators

baol
+6  A: 

(I started this as a reply to ndim, but it got to long)

As ndim already posted, termination can take a long time.
Likely reasons are:

  • you have lots of allocations, and parts of the heap are swapped to disk.
  • long running destructors
  • other atexit routines
  • OS specific cleanup, such as notifying DLL's of thread & process termination on Windows (don't know what exactly happens on Linux.)

exit is not the worst workaround here, however, actual behavior is system dependent. e.g. exit on WIndows / MSVC CRT will run global destructors / atexit routines, then call ExitProcess which does close handles (but not necessarily flush them - at least it's not guaranteed).

Downsides: Destructors of heap allocated objects don't run - if you rely on them (e.g. to save state), you are toast. Also, tracking down real memory leaks gets much harder.

Find the cause You should first analyze what is happening.

e.g. by manually freeing the root objects that are still allocated, you can separate the deallocation time from other process cleanup. Memory is the likely cause accordign to your description, but it's not the only possible one. Some cleanup code deadlocking before it runs into a timeout is possible, too. Monitoring stats (such as CPU/swap activity/disk use) can give clues.

Check the release build - debug builds usually use extra data on the heap that can immensely increase cleanup cost.

Different allocators
Ifdeallocation is the problem, you might benefit a lot from using custom allocation mechanisms. Example: if your map only grows (items are never removed), an arena allocator can help a lot. If your lists of integers have many nodes, switch to a vector, or use a rope if you need random insertion.

peterchen
+1 for custom allocators, imo with the correct allocators and object pooling, the amount of system deallocations could be reduced to a point where it no longer cause problems, removing the need for hacky exit calls
Necrolis
+1  A: 

The objects in memory are organized in a heap. They are not deleted at once, they are deleted one by one, and the cost of deleting an object is O(log n). Freeing them takes loooong.

The answer is then, yes, it takes so much time.

modosansreves
Memory allocators are complicated and the heap is not literally a heap. Hence being renamed in C++ the "free store."
Potatoswatter
+6  A: 

Certainly it's possible.

About 7 years ago I had a similar problem on a project, there was much less memory but computers were slower too I suppose.

We had to look at the assembly languge for free in the end to work out why it was so slow and it seemed that it was essentially keeping the freed blocks in a linked list so they could be reallocated and was also scanning that list looking for blocks to combine. Scanning the list was an O(n) operation but freeing 'n' objects turned it into O(n^2)

Our test data took about 5 seconds to free the memory but some customers had about 10 times as much data as we every used and it was taking 5-10 minutes to shut down the program on their systems.

We fixed it, as has been suggested by just terminating the process instead and letting the operating system clear up the mess (which we knew was safe to do on our application).

Perhaps you have a more sensible free function that we had several years ago, but I just wanted to post that it's entirely possible if you have many objects to free and an O(n) free operation.

John Burton
+1  A: 

You can avoid free being called on an object by using a destructor call my_object->~my_class() instead of delete my_object. You can avoid free on all objects of a class by overriding and nullifying operator delete( void * ) {} inside the class. Derived classes with virtual destructors will inherit that delete, otherwise you can copy-paste (or maybe using base::operator delete;).

This is much cleaner than calling exit. Just be sure you don't need that memory back!

Potatoswatter
A: 

In my experience, the calls to free or delete should not take a significant amount of time. That said, I have seen plenty of cases where it does take non-trivial time to destruct objects because of destructors that did non-trivial things. If you can't tell what's taking time during the destruction, use a debugger and/or a profiler to determine what's going on. If the profiler shows you that it really is calls to free() that take a lot of time, then you should improve your memory allocation scheme, because you must be creating an extremely large number of small objects.

As you noted plenty of applications allocate large amounts of memory, and incur no significant memory during shutdown, so there's no reason your program can't do the same.

karunski
A: 

I would recommend (as some others have) a simple forced process termination, if you're certain that you've nothing left to do but free memory (for example, no file i/o and such left to do).

The thing is that when you free memory, typically, it's not actually returned to the OS - it's held in a list to be reallocated, and this is obviously slow. However, if you terminate process, the OS will lump reclaim all your memory at once, which should be substantially faster. However, as others have said, if you have any destructors that need to run, you should ensure that they are run before force calling exit() or ExitProcess or anysuch function.

What you should be aware of is that deallocating memory that is spread out (e.g., two nodes in a map) is much slower due to cache effects than deallocating memory in a vector, because the CPU needs to access the memory to free it and run any destructors. If you deallocated a very large amount of memory that's very fragmented, you could be falling afoul of this, and should consider changing to some more contiguous structures.

I actually had a problem where allocating memory was faster than de-allocating it, and after allocating memory and then de-allocating it, I had a memory leak. Eventually, I worked out that this is why.

DeadMG
+1  A: 

I guess your unordered map is a global variable, whose constructor is called at process startup, and destructor is called at process exit.

How could you know if the map is guilty?

You can test if your unordered_map is responsible (and I guess it is) by allocating it with a new, and, well, ahem... forget to delete it.

If your process' exit goes faster, then you have your culprit.

Why this is so sloooooow?

Now, just by reading your post, for your unordered map, I see potential allocations for:

  • strings allocated buffer
  • list items (each one being a string + other things)
  • unordered map items + the bucket array

If you have 3-8 Gb of data in this unordered map, this means that each item above will need some kind of new and delete. And if you free every item, one by one, it could take time.

Other reasons?

Note that if you add items to your map item by item while your process executing, the new are not exactly perceptible... But the moment you want to clean all, all your allocated items must be destroyed at the same time, which could explain the perceived difference between construction/use and destruction...

Now, the destructors could take time for an additional reason.

For example, on Visual C++ 2008 in debug mode, for example, upon destruction of STL iterators, the destructor verifies the iterators are still correct. This caused quite a slowdown upon my object destruction (which was basically a tree of nodes, each node having list of child nodes, with iterators everywhere).

You are working on gcc, so perhaps they have their own debug testing, or perhaps your destructors are doing additional work (e.g. logging?)...

paercebal