views:

3491

answers:

12

I wrote a multi-threaded program which does some CPU heavy computation with a lot of floating point operations. More specifically, it's a program which compares animation sequences frame by frame. I.e. it compares frame data from animation A with all the frames in animation B, for all frames in animation A. I carry out this intensive operation for different animations in parallel, so the program can be working on A-B pair, B-C pair and C-A pair in parallel. The program is using QtConcurrent and a "map" function which maps a container with motions onto a function. QtConcurrent manages thread pool for me, I am working on Intel Quad Core processor so it spawns 4 threads.

Now, the problem is that my process destroys my CPU. The usage is 100% constant and I actually get a Blue Screen of Death if I run my program on a big enough set of motions (Page fault in non-paged area). I suspect that this is because my computer is overclocked. However, could this be because of the way I coded my program? Some very intensive benchamrking tools I used to test my machine's stability never crashed my PC. Is there any way to control how my program uses my CPU to reduce the load? Or perhaps I am misunderstanding my problem?

+9  A: 

Overclocking PCs can lead to all sorts of strange problems. If you suspect that to be the root cause of your problem, try to clock it in reasonable ranges and retry your tests.

It could also be some sort of quite strange memory-bug where you corrupt your RAM in a way where Windows (I guess that OS, because of BSOD) cannot recover anymore (very unlikely, but who knows).

Another possibility I can think of is, that you've got some error in your threading-implementation which kills windows.

But at first, I'd look at the overclocking-issue...

Kosi2801
"I guess that OS, because of BSOD": because of the name, right? Not that it crashed? :D
Simon Buchan
BSOD = Blue Screen Of Death, mostly used for the blue error-screen on Windows Operating Systems, so I guessed Windows because of the name. The crashing behaviour would be no good indicator for Windows under these conditions because every OS would crash if the hardware is operated beyond its limits.
Kosi2801
A: 

I think blue screen of death is caused when kernel memory region gets corrupted. So using multithreading to carry out parallel operations could not be the reason for this.

Well if you are creating multiple threads each carrying heavy floating point operations then definitely your CPU utilization will reach upto 100%.

It would be better if you can give some sleep in each thread so that other process get some chance. You may also try to reduce the priority of threads.

Alien01
+1  A: 

It's all too easy to blame the hardware. I would suggest you try running your program on a different system and see how that turns out with the same data.

Probably you have a bug.

Subtwo
A program causing a system crash isn't normal. An application should never be able to crash the operating system. If it's not hardware, then it is more likely a bug in the operating system.
dreamlax
In my experience BSOD depends on bug in OS and/or bug in kernel drivers. So, I don't think the OS will malfunction with a perfectly working application. What is the least tested part here? I would say the application not the hardware or OS.
Subtwo
My point is that a bug in OS enables a malfunction application to BSOD the system but wouldn't perhaps BSOD if the application didn't have a bug.
Subtwo
It could simply be a hardware failure due to a combination of the overclocking and his usage (which will be thrashing the CPU cache like crazy)
Simon Buchan
@Subtwo: I used to work in hardware repair fixing motherboards and laptops for Toshiba. Hardware isn't perfect, and it's usually the last place to lay blame, but now that I have worked in both software development and hardware repair I can say BSODs are quite commonly caused by hardware faults.
dreamlax
Yes, I know all this. But considering the OP claims to have run extensive benchmark on the hardware without trouble I would certainly not rule out the possibility of him having a bug in his program. I admit that overclocking is flaky and would increase the possibility of hardware problems, sure.
Subtwo
+4  A: 

I suspect that this is because my computer is overclocked.

It's definitely possible. Try setting it to normal speed for a while.

could this be because of the way I coded my program?

A program running in user mode is very unlikely to cause a BSOD.

Jimmy J
+2  A: 

The overclocking is the most likely cause of the instability. With any CPU intensive algorithm there is going to be some CPU thrashing. The overclocking not withstanding, I would find a good performance profiler to find performance bottlenecks. Never guess where the problem is. You could spend months optimizing something that has no real affect on performance or worse performance could even decrease.

codeelegance
+5  A: 

the kind of operation you've described is already highly parallelizable. Running more than one job may actually hurt performance. The reason for this is because the cache of any processor is of limited size, and the more you try to do concurrently, the smaller each thread's share of the cache becomes.

You might also look into the options using your GPU to soak up some of the processing load. Modern GPU's are vastly more efficient for most kinds of video transformation than CPU's of similar generations.

TokenMacGuy
A: 

If in Windows platform, put after some work one call to function to inform CPU you want to make the cpu to other processes. Make a call to sleep function like that :

Slepp ( 0 );

lsalamon
A: 

With the absence of the BSOD error code (useful for looking up) it is a bit harder to help you with this one.

You might try physically reseating your memory ((take it out and drop it in). I, and some others I know, have worked on a few machines where this was needed. For instance I once trying to upgrade OS X on a machine and it kept crashing... finally I popped the memory out and dropped it back in and everything was fine.

TofuBeer
A: 

Sleep(1); will cut CPU usage in half. I ran into the same problem working with a CPU intensive algorithm.

Erik Ahlswede
Only if your work unit is 1ms long and you aren't using multiple threads. If so, it will also cut your *speed* in half.
Simon Buchan
+1  A: 

At a guess, I would say you are not running of a 3-core machine (or 4, given 100% usage), and parallelizing will actively hurt your performance if you use more threads than cores. Make only one thread per CPU core, and whatever you do, never have data accessed by different threads at the same time. The cache-locking algorithms in most multi-core CPUs will absolutely slaughter your performance. In this case, on a N-core CPU processing L-frame animations, I would use thread 1 on frames 0-(L/N), thread 2 on frames (L/N)-(2*L/N), ... thread N on frames ((N-1)*L/N)-L. Do the different combinations (A-B, B-C, C-A) in sequence so you don't thrash your cache, also, it should be simpler to code.

As a side note? Real computation like this should be using 100% CPU, it means it's going as fast as it can.

Simon Buchan
Thanks for the tip, Simon. I will definitely have a look at ways in which I can make my program more cache friendly
sneg
+1  A: 

Look into using SIMD operations. I think you'd want SSE in this case. They're often a better first step than parallelization as they are easier to get correct and provide a pretty hefty boost to most linear algebra types of operations.

Once you get it using SIMD, then look into parallelizing. It sounds like you're slamming the CPU also, so you could perhaps do with some sleeps instead of busy waits perhaps, and make sure you're cleaning up or reusing threads properly.

Dan Olson
Just as a side-note, I did try using SIMD operations. But then I tried compiling using /arch:sse2 flag. The performance looked very similar so I assume compiler does pretty good job at using the extended instruction set.
sneg
+2  A: 

There are some excellent answers here.

I would only add, from the perspective of having done lots of performance tuning, unless each thread has been optimized aggressively, chances are it has lots of room for cycle-reduction.

To make an analogy with a long-distance auto race, there are two ways to try to win:

  1. Make the car go faster
  2. Make fewer stops and side-trips

In my experience, most software as first written is quite far from taking the most direct route, especially as the software gets large.

To find wasted cycles in your program, as Kenneth Cochran said, never guess. If you fix something without having proved that it is a problem, you are investng in a guess.

The popular way to find performance problems is to use profilers.

However, I do this a lot, and my method is this: http://www.wikihow.com/Optimize-Your-Program%27s-Performance

Mike Dunlavey
Thanks for a nice conclusion to this topic, Mike. I had trouble selecting the answer because they were all equally useful and all gave me something to think about. Your answer, however, gives a good closure.
sneg
Thanks for a good question. There are goofy ideas about performance infecting the world, and your question sheds light.
Mike Dunlavey