ansaurus

Question

C++ map performance - Linux (30 sec) vs Windows (30 mins) !!!

Answer 1

+2 A:

I very very strongly doubt that your performance problem is coming from the STL containers.

Try to eliminate (comment out) the call to foo(pFile) or any other method which touches the filesystem. Although running foo(pFile) once may appear fast, running it on 1000 different files (especially on Windows filesystems, in my experience) could turn out to be much slower (e.g. because of filesystem cache behaviour.)

EDIT

Your initial post was claiming that BOTH debug and release builds were affected. Now you are withdrawing that claim.

Be aware that in DEBUG builds:

the STL implementation performs extra checks and assertions
heap operations (memory allocation etc.) perform extra checks and assertions; moreover, under debug builds the low-fragmentation heap is disabled (up to a 10x overall slowdown in memory allocation)
no code optimizations are performed, which may result in further STL performance degradation (STL relying many a time heavily on inlining, loop unwinding etc.)

With 1000 iterations you are probably not affected by the above (not at the outer loop level at least) unless you use STL/the heap heavily INSIDE foo().

vladr 2010-05-14 05:35:36

The files that have been processed already are stored in the map, so that step is correct.

Dennis Zickefoose 2010-05-14 05:45:14

No, gProcessedFileList[] is a cache of processed files to avoid reprocessing the same one repeatedly.

kibibu 2010-05-14 05:45:39

i will check whether the files are getting closed.

sonofdelphi 2010-05-14 05:59:53

1. at the time of the initial post I had unknowingly checked the behavior with the debug build thinking it was the release build. I am sorry about that. 2. foo() also uses STL quite heavily...as I've indicated in the updated version of the question.

sonofdelphi 2010-05-14 17:24:06

Answer 2

+2 A:

Break into the program using the debugger at a random time, and the chances are very high that the stack trace will tell you where it's spending the time.

bk1e 2010-05-14 05:39:25

have updated this in the question. it is in the middle of a vector add.

sonofdelphi 2010-05-14 06:16:26

Answer 3

+2 A:

I would approach it like any performance problem. This means: profiling. MSVC has a built-in profiler, by the way, so it may be a good chance to get familiar with it.

Eli Bendersky 2010-05-14 05:40:57

The Express edition does not have a profiler built in, I don't think. Can you suggest alternatives?

sonofdelphi 2010-05-14 06:03:25

@sonofdelphi: try to compile the same program on Windows with mingw and use `gprof`

Eli Bendersky 2010-05-14 06:09:06

Answer 4

+1 A:

I would be astounded if the performance issues you are seeing have anything at all to do with the map class. Doing 1000 lookups and 1000 insertion should take a combined time on the order of microseconds. What is foo() doing?

Marcelo Cantos 2010-05-14 05:41:36

have updated the details of foo() in the question.

sonofdelphi 2010-05-14 06:16:58

Answer 5

+1 A:

Without knowing how the rest of the code fits in, I think the overall idea of caching processed files is a little flaky.

Try removing duplicates from your vector first, then process them all.

kibibu 2010-05-14 05:47:08

Answer 6

+1 A:

Try commenting each block or major operation to determine which part actually caused the difference in execution time in Linux and Windows. I also don't think it would be because of the STL map. The problem may be inside foo(). It may be in some file operation as it is the only thing I could think of that would be costly in this case.

You may insert clock() calls in between operations to get an idea of the execution time.

jasonline 2010-05-14 07:16:50

Answer 7

+1 A:

You say that when you break, you find yourself inside vector::add. You don't have a vector::add in the code you've shown us, so I suspect it's inside the foo function. Without seeing that code, it's going to be difficult to say what's up.

You might have inadvertently created a Shlemiel the Painter algorithm.

Mark Ransom 2010-05-14 07:26:09

+1 for the link, interesting reading

Jamie Cook 2010-05-14 08:43:29

I have not put the foo() code in the post; have only explained what it does.

sonofdelphi 2010-05-14 10:21:41

"Shlemiel the Painter" is just a new name for what's already known (to me) as O(n2). But really, big-oh of n squared should be a complexity class in and of itself in the same manner as off-by-one is bug class in and of itself, because it's so common.

wilhelmtell 2010-05-15 22:36:11

Answer 8

+15 A:

In the Microsoft Visual Studio, there's a global lock when accessing the Standard C++ Library to protect from multi threading issue in Debug builds. This can cause big performance hits. For instance, our full test code runs on Linux/gcc in 50 minutes, whereas it needs 5 hours on Windows VC++2008. Note that this performance hit does not exist when compiling in Release mode, using the non-debug Visual C++ runtime.

Didier Trosset 2010-05-14 07:27:14

i think this is it. i generated a new Release build and tried it. The performance improved to 240 seconds...Thanks for sharing the experience. Is there a workaround for this Debug STL locking?

sonofdelphi 2010-05-14 10:14:04

look at http://blogs.msdn.com/vcblog/archive/2009/06/23/stl-performance.aspx and comments

Francesco 2010-05-14 10:24:19

I would also add that even in release mode iterators in VC do extra checking that the GCC isn't doing. google _SECURE_SCL for more info.

stonemetal 2010-05-14 15:11:48

Answer 9

+1 A:

You can improve things somewhat if you ditch your map and partition your vector instead. This implies reordering the input files list. It also means you have to find a way of quickly determining if a file has been processed already, possibly by holding a flag in the File class. If it's ok to reorder the files list and if you can store that dirty flag in the File object then you can improve performance from O(n log m) to O(n), for n total files and m processed files.

#include <algorithm>
#include <functional>
// ...
vector<File*>::iterator end(partition(inputfiles.begin(), inputfiles.end(),
                                      not1(mem_fun(&File::is_processed))));
for_each(inputfiles.begin(), end, processFile);

If you can't reorder the files list or if you can't change the File object then you can switch the map with a vector and shadow each file in the input files list with a flag in the second vector at the same index. This will cost you O(n) space but will give you O(1) check for dirty state.

vector<File*> processed(inputfiles.size(), 0);

for( vector<File*>::size_type i(0); i != inputfiles.size(); ++i ) {
    if( processed[i] != 0 ) return;  // O(1)
    // ...
    processed[i] = inputfiles[i];    // O(1)
}

But be careful: You're dealing with two distinct pointers pointing at the same address, and that's the case for each pair of pointers in the two containers. Make sure one and only one pointer owns the pointee.

I don't expect either of these to yield a solution for that performance hit, but nevertheless.

wilhelmtell 2010-05-14 08:13:47

Answer 10

A:

If you are doing most of your work in linux then I strongly strongly suggest you only ever compile to release mode in windows. That makes life much easier, especially considering all the windows inflexible library handling headaches.

brian 2010-05-15 04:49:34

ansaurus

tags:

views:

answers:

C++ map performance - Linux (30 sec) vs Windows (30 mins) !!!

EDIT

related questions