views:

217

answers:

4

We are working on a video processing application using EmguCV and recently had to do some pixel level operation. I initially wrote the loops to go across all the pixels in the image as follows:

for (int j = 0; j < Img.Width; j++ )
{
    for (int i = 0; i < Img.Height; i++)
    {
        // Pixel operation code
    }
}

The time to execute the loops was pretty bad. Then I posted on the EmguCV forum and got a suggestion to switch the loops like this:

for (int j = Img.Width; j-- > 0; )
{
    for (int i = Img.Height; i-- > 0; )
    {
        // Pixel operation code
    }
}

I was very surprised to find that the code executed in half the time!

The only thing I can think of is the comparison that takes place in the loops each time accesses a property, which it no longer has to. Is this the reason for the speed up? Or is there something else? I was thrilled to see this improvement. And would love it if someone could clarify the reason for this.

+2  A: 

It's because the CPUs are like hockey players, they go faster when going backward ;-)

More seriously:
This is not related in the direction of the loop in any way, but rather to the fact that the in the original construct, the loop control conditions implied dereferencing the Img object to index to its Width or Height property (for each and single iteration in the loops), whereby the second construct evaluates these properties only once.
Also, the fact that the new condition tests against the value 0, saves even the loading of an immediate value. This probably explains the difference (assuming the work done inside the inner was relatively minimal, i.e. +/- the same as work to test an Object.Property, since you indicate a roughly 50% gain).

Edit:
see Michael Stum's answer, which indicates that the Img.Width/Height reference is even more costly than thought. As it sometimes happens with properties, the implementation of the object may run a significant amount of code to produce the value (for example it may do a bunch of math to get to the width, each time, rather than somehow caching it etc..). This seems to be the case with this Img object, hence the interest to do this only once (if you are sure that the value will remain constant for the duration of the loop logic).

mjv
+4  A: 

It's not the loop reversal that speeds things up -- it's the fact that you're accessing the Width and Height properties far fewer times.

Ben M
+14  A: 

I assume you are using the System.Drawing.Image class? Looking at the implementation of .Width and .Height I see they do a function call into GDI+ (GdipGetImageHeight and GdipGetImageWidth in gdiplus.dll), which seems to be rather expensive.

By going backwards you make that call once, rather than in every iteration.

Michael Stum
+1. very good point!
Mitch Wheat
Excellent point! We often assume that referencing these properties is a cheap operation, forgetting that the underlying object may "take its sweet time" in producing the value...
mjv
Properties are normally supposed to be cheap, and maybe they are, but in a loop it adds up rapidly.
Michael Stum
+14  A: 

The difference isn't the cost of branching, it's the fact that you are fetching an object property Img.Width and Img.Height in the inner loop. The optimizer has no way of knowing that these are constants for purposes of that loop.

You should get the same performance speedup by doing this.

const int Width = Img.Width;
const int Height = Img.Height;
for (int j = 0; j < Width; j++ )
{
    for (int i = 0; i < Height; i++)
    {
        // Pixel operation code
    }
}

Edit:

As Joshua Suggests, putting Width in the inner loop will have you walking through the memory sequentially, which will be better cache coherency, and might be faster. (depends on how big your bitmap is).

const int Width = Img.Width;
const int Height = Img.Height;
for (int i = 0; i < Height; i++)
{
    for (int j = 0; j < Width; j++ )
    {
        // Pixel operation code
    }
}
John Knoeller
Also, almost loop over height outside of width. Your L1 cache will thank you.
Joshua
@Joshua: And Excellent suggestion, Width on the inner loop will have _much_ better locality of reference.
John Knoeller
If you're iterating over entire 2d arrays, knowing how they are laid out in memory is essential if you need the speed. IIRC .Net, like most everything, is is row-major order. http://en.wikipedia.org/wiki/Row-major_order
Robert Paulson