views:

61

answers:

2

Hi, I'm iterating over a 3 dimensional array (which is an image with 3 values for each pixel) to apply a 3x3 filter to each pixel as follows:

//For each value on the image    
for (i=0;i<3*width*height;i++){
    //For each filter value
    for (j=0;j<9;j++){
        if (notOutsideEdgesCondition){
            *(**(outArray)+i)+= *(**(pixelArray)+i-1+(j%3)) * (*(filter+j));
        } 
    }
}

I'm using pointer arithmetic because if I used array notation I'd have 4 loops and I'm trying to have the least possible number of loops. My problem is my notOutsideEdgesCondition is getting quite out of hands because I have to consider 8 border cases. I have the following handled conditions

  • Left Column: ((i%width)==0) && (j%3==0)
  • Right Column: ((i-1)%width ==0) && (i>1) && (j%3==2)
  • Upper Row: (i<width) && (j<2)
  • Lower Row: (i>(width*height-width)) && (j>5)

and still have to consider the 4 corner cases which will have longer expressions. At this point I've stopped and asked myself if this is the best way to go because If I have a 5 line long conditional evaluation it'll not only be truly painful to debug but will slow the inner loop. That's why I come to you to ask if there's a known algorithm to handle this cases or if there's a better approach for my problem. Thanks a lot.

A: 

A nice tip is to add an additional row at the top of the array, and another at the end (do the same for the columns).

These additionals rows/columns won't contain any information but they will ease the computation (no border cases). At the price of consuming more memory...

Just an idea :)

ereOn
i think it won't take that much more memory. i may give it a try
kirbuchi
Adding a border row around the entire array will work, but involves copying the array, which might not be the most efficient method (then again, it might be, depending on your processors caching strategies.)
swestrup
+1  A: 

Yes, there's a much better way. Write a fast loop to handle the cases where there is guaranteed to be no boundary problems. This will consist of the region from the second to the next-to-last columns and the second to next-to-last rows. Then you can write four routines to handle each of the sides (row 0, column 0, row N and column N) and you can hand code the last four points.

That said, there are also a heck of a lot faster ways of doing the addressing calculations you're doing.

swestrup
I'll try that, it'll take more coding but hopefully i can make it simpler. Also, could you point me in the right direction to make the addressing calculations faster?
kirbuchi
I did a quick google to try to find a tutorial on unrolling array addressing calculations for you, but didn't find one.In brief, avoid as much per-pixel addressing calculations as possible. Use a pointer that gets incremented from pixel to pixel as you work, and remove all modulo and multiply operations.To implement your 3x3 kernel addressing, you should either maintain 3 pointers (one to your current place in each row) or one pointer and a constant 'stride' (displacement between rows) that gets added in. Use multiple loops to know when to add/subtract the stride.Dang. No more characters
swestrup
Ok so now i have the 4 side cases handled and will hand-code the corner cases. The 3 pointer approach for the kernel seems best because I can individually disable each kernel row when addressing border cases. With that in mind I can proceed similarly with the image array and have an easier time separating them for later CUDA implementation which is my ultimate goal. Thanks a lot.PS: I'll look for a book or tutorial on array unrolling and addressing optimizations.
kirbuchi