views:

246

answers:

5

I'm building a large RTree (spatial index) full of nodes. It needs to be able to handle many queries AND updates. Objects are continuously being created and destroyed. The basic test I'm running is to see the performance of the tree as the number of objects in the tree increases. I insert from 100-20000 uniformly size, randomly located objects in increments of 100. Searching and updating are irrelevant to the issue I am currently faced with.

Now, when there is NO memory leak the "insert into tree" performance is everywhere. It goes anywhere from 10.5 seconds with ~15000 objects to 1.5 with ~18000. There is no pattern whatsoever.

When I deliberately add in a leak, as simple as putting in "new int;" I don't assign it to anything, that right there is a line to itself, the performance instantly falls onto a nice gentle curve sloping from 0 (roughly) seconds for 100 objects to 1.5 for the full 20k.

Very, very lost at this point. If you want source code I can include it but it's huuugggeeee and literally the only line that makes a difference is "new int;"

Thanks in advance! -nick

+2  A: 

I'm not sure how you came up with this new int test, but it's not very good way to fix things :) Run your code using a profiler and find out where the real delays are. Then concentrate on fixing the hot spots.

g++ has it built in - just compile with -pg

viraptor
So I'm relatively new to C++ and I downloaded a class to help me time things, I've been trying to get the updates to the tree as fast as I possibly could (it's stupid to remove an element and then add it again if you're just going to move a little bit more inside the node you were already in...). I was using this class to time things and everything was going splendidly until I realized I was never actually deleting. I deleted it and then all hell broke loose.
f34r
"proper" profiler works in a bit different way. I don't know which compiler are you using, but all of them have some profiler available. Instead of timing a single function, your application profile will include the timing of all functions you're calling with basic statistics broken down by the caller function (or by stack trace). Instead of guessing, try profiling the actual run and find out what is it that takes a long time. There has to be a reasonable explanation and you'll need to pinpoint the actual slow lines. Profilers don't modify the code itself, so they're safer than a timing class.
viraptor
A: 

The possible thing that is happening which explains this (a theory)

  1. The compiler did not remove the empty new int
  2. The new int is in one of the inner loops or somewhere in your recursive traversal wherein it gets executed the most amount of time
  3. The overall RSS of the process increases and eventually the total memory being used by the process
  4. There are page faults happening because of this
  5. Because of the page-faults, the process becomes I/O bound instead of being CPU bound End result, you see a drop in the throughput. It will help if you can mention the compiler being used and the options for the compiler that you are using to build the code.
Gangadhar
This is what I, and those I've talked to, have been leaning towards. Visual Studio 2005 Compiler. All Options:/O2 /GL /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /FD /EHsc /MD /fp:fast /Yu"stdafx.h" /Fp"Release\DestinyTest.pch" /Fo"Release\\" /Fd"Release\vc80.pdb" /W3 /nologo /c /Wp64 /Zi /TP /errorReport:prompt
f34r
I'd agree if it was a change of up to... let's say 25% of runtime. But "10.5 seconds with ~15000 objects to 1.5 with ~18000" is a 8x difference. I've seen a max of 4x difference in a tight loop almost specifically testing page faults once. So while it could be the real explanation, a bug in the code is a much more likely one :)
viraptor
+1  A: 

Without more information it's impossible to be sure.

However I wonder if this is to do with heap fragmentation. By creating a freeing many blocks of memory you'll likely be creating a whole load of small fragments of memory linked together.The memory manager needs to keep track of them all so it can allocate them again if needed.

Some memory managers when you free a block try to "merge" it with surrounding blocks of memory and on a highly fragmented heap this can be very slow as it tries to find the surrounding blocks. Not only this, but if you have limited physical memory it can "touch " many physical pages of memory as it follows the chain of memory blocks which can cause a whole load of extremely slow page faults which will be very variable in speed depending on exactly how much physical memory the OS decides to give that process.

By leaving some un-freed memory you will be changing this pattern of access which might make a large difference to the speed. You might for example be forcing the run time library to allocate new block of memory each time rather than having to track down a suitably sized existing block to reuse.

I have no evidence this is the case in your program, but I do know that memory fragmentation is often the causes of slow programs when a lot of new and free is performed.

John Burton
this also might be what is going on. Is there a way I can include pictures? I have graphs like crazy of every possible scenario
f34r
A: 

Solution: Do not run from Visual Studio. Actually run the .exe file. Figured this out because that's what the profilers were doing and the numbers were magically dropping. Checked memory usage and version running (and giving me EXCEPTIONAL times) was not blowing up to excessively huge sizes.

Solution to why the hell Visual Studio does ridiculous crap like this: No clue.

f34r
Because programmers who care about the performance of their code generally aren't running it through the IDE when testing the performance?
Bill
If you attach the debugger the performance will go down *of course*.
Axel Gneiting
A: 

I am taking a stab in the dark here but the problem could be the way the heap gets fragmented. You said that you are creating a destroying large numbers of objects. I will assume that the objects are all of different size.

When one allocates memory on the heap, a cell the size needed is broken off from the heap. When the memory is freed, the cell is added to a freelist. When one does a new alloc, the allocator walks the heap until a cell that is big enough is found. When doing large numbers of allocations, the free list can get rather long and walking the list can take a non-trivial amount of time.

Now an int is rather small. So when you do your new int, it may well eat up all the small heap cells on the free list and thus dramatically speed up larger allocations.

The chances are, however that you are allocating and freeing similar sized objects. If you use your own freelists, you will safe yourself many heap walks and may dramatically improve performance. This is exactly what the STL allocators do to improve performance.

doron