Hi, I've a really strange problem:
I've an application that launches some workers on parallel:
for (it = jobList.begin(); it != jobList.end(); it++) {
DWORD threadId;
Job job = *it;
Worker *worker = new Worker(job);
workers[i] = worker;
threads[i++] = CreateThread((LPSECURITY_ATTRIBUTES)NULL, (DWORD)0, &launchThread, worker, (DWORD)0, &threadId);
}
WaitForMultipleObjects((DWORD)jobList.size(), threads, (BOOL)true, (DWORD)INFINITE);
They allocate a bunch of things, so I assume that they synchronize on the new but this is the only place where they eventually synchronize each other.
When I ran the application on a single-core machine, everything is fine; when I launch the application on a multi-core machine, the performances get much worse, worse than that:
for (it = jobList.begin(); it != jobList.end(); it++) {
DWORD threadId;
Job job = *it;
Worker *worker = new Worker(job);
workers[i] = worker;
threads[i++] = CreateThread((LPSECURITY_ATTRIBUTES)NULL, (DWORD)0, &launchThread, worker, (DWORD)0, &threadId);
WaitForSingleObject(threads[i-1], (DWORD)INFINITE);
}
Anyone does have a reasonable guess to give to me?
EDIT:
I have run some tests, and I've found that:
- Changing the allocator with the state of the art of parallel allocator doesn't help
- The results of the multithreaded application are better on a machine with a Core 2 duo (two cores with a shared L2 cache) than with a dual xeon (two processor with different caches).
I'm thinking that I've in my hands an application with a memory access bottleneck, but... How I can check if this is really the problem, or I should looking at other places?