Hi,
My Professor found out this interesting experiment of 3D Linearly separable Kernel Convolution using SSE and OpenMP, and gave the task to me to benchmark the statistics on our system. The author claims a crazy 18 fold speedup from the serial approach! Might not be always, but we were expecting at least a 2-4 times speedup running this on a Dual Core Intel.
Alas, we could find exactly no speedup. The serial code performs always better, with or without OpenMP.
I am using Linux, and observed a certain trend...when no other processes are running on the system, after a while the loadavg starts increasing, and the the %CPU utilization falls down.
Another probable false positive which I ran into accidentally...I started the program, then immediately paused it. Then I ran it on background with bg, and saw a speedup of more than 2. This happens all the time!
Any advice would be great.
Thanks, Sayan