views:

125

answers:

2

My computer is a dual core core2Duo. I have implemented multithreading in a slow area of my application but I still notice cpu usage never exceeds 50% and it still lags after many iterations. Is this normal? I was hopeing it would get my cpu up to 100% since im dividing it into 4 threads. Why could it still be capped at 50%?

Thanks

See http://stackoverflow.com/questions/3190158/what-am-i-doing-wrong-multithreading

for my implementation, except I fixed the issue that that code was having

+1  A: 

From your description we have very little to go on, however, let me see if I can help:

  1. You have implemented a lock-based system but you aren't judiciously using the resources of the second, third, or fourth threads because the entity that they require is constantly locked. (this is a very real and obvious area I'd look into first)
  2. You're not actually using more than a single thread. Somehow, somewhere, those other threads aren't even fired up or initialized. (sounds stupid but I've done this before)

Look into those areas first.

wheaties
Call stack says theres 4 threads running, also none of the threads ever access the same element.
Milo
@user146780: I am unsure how you work out the number of running threads by looking at a call stack. Would you care to elaborate?
Anon.
Well I added a giant for loop and got the cpu up to 100%, but as James McNellis has just told me I think im doing too many heap allocations
Milo
+1  A: 

Looking at your code, you are making a huge number of allocations in your tight loop--in each iteration you dynamically allocate two, two-element vectors and then push those back onto the result vector (thus making copies of both of those vectors); that last push back will occasionally cause a reallocation and a copy of the vector contents.

Heap allocation is relatively slow, even if your implementation uses a fast, fixed-size allocator for small blocks. In the worst case, the general-purpose allocator may even use a global lock; if so, it will obliterate any gains you might get from multithreading, since each thread will spend a lot of time waiting on heap allocation.

Of course, profiling would tell you whether heap allocation is constraining your performance or whether it's something else. I'd make two concrete suggestions to cut back your heap allocations:

  • Since every instance of the inner vector has two elements, you should consider using a std::array (or std::tr1::array or boost::array); the array "container" doesn't use heap allocation for its elements (they are stored like a C array).
  • Since you know roughly how many elements you are going to put into the result vector, you can reserve() sufficient space for those elements before inserting them.
James McNellis