ansaurus

Question

Answer 1

+1 A:

There are a few places to reuse temporary variables, but whether it would improve performance would have to be measured as dirkgently stated:

Change

  if (r1 >= 0 && c1 >= 0) A = data[r1 * step + c1]; 
  if (r1 >= 0 && c2 >= 0) B = data[r1 * step + c2]; 
  if (r2 >= 0 && c1 >= 0) C = data[r2 * step + c1]; 
  if (r2 >= 0 && c2 >= 0) D = data[r2 * step + c2];

to

  if (r1 >= 0) {
    int r1Step = r1 * step;
    if (c1 >= 0) A = data[r1Step + c1]; 
    if (c2 >= 0) B = data[r1Step + c2]; 
  }
  if (r2 >= 0) {
    int r2Step = r2 * step;
    if (c1 >= 0) C = data[r2Step + c1]; 
    if (c2 >= 0) D = data[r2Step + c2]; 
  }

You may actually end up doing the temp multiplactions too often in case your if statements rarely provides true.

Claus Broch 2010-09-08 14:11:46

if proper optimization flags are used, this will automatically be taken care of

Yogesh Arora 2010-09-08 21:35:22

Answer 2

+8 A:

Specialize for the edges so that you don't need to check for them in every row and column. I assume that this call is in a nested loop and is called a lot. This function would become:

inline float BoxIntegralNonEdge(IplImage *img, int row, int col, int rows, int cols) 
{
  float *data = (float *) img->imageData;
  int step = img->widthStep/sizeof(float);

  // The subtraction by one for row/col is because row/col is inclusive.
  int r1 = row - 1;
  int c1 = col - 1;
  int r2 = row + rows - 1;
  int c2 = col + cols - 1;

  float A(data[r1 * step + c1]), B(data[r1 * step + c2]), C(data[r2 * step + c1]), D(data[r2 * step + c2]);

  return std::max(0.f, A - B - C + D);
}

You get rid of a conditional and branch for each min and two conditionals and a branch for each if. You can only call this function if you already meet the conditions -- check that in the caller for the whole row once instead of each pixel.

I wrote up some tips for optimizing image processing when you have to do work on each pixel:

http://www.atalasoft.com/cs/blogs/loufranco/archive/2006/04/28/9985.aspx

Other things from the blog:

You are recalculating a position in the image data with 2 multiplies (indexing is multiplication) -- you should be incrementing a pointer.
Instead of passing in img, row, row, col and cols, pass in pointers to the exact pixels to process -- which you get from incrementing pointers, not indexing.
If you don't do the above, step is the same for all pixels, calculate it in the caller and pass it in. If you do 1 and 2, you won't need step at all.

Lou Franco 2010-09-08 14:12:25

Answer 3

A:

The compiler probably handles inling automatically where it's proper.

Without any knowledge about the context. Is the if(r1 >= 0 && c1 >= 0) check necessary?

Isn't it required that the row and col parameters are > 0?

float BoxIntegral(IplImage *img, int row, int col, int rows, int cols) 
{
  assert(row > 0 && col > 0);
  float *data = (float*)img->imageData; // Don't use C-style casts
  int step = img->widthStep/sizeof(float);

  // Is the min check rly necessary?
  int r1 = std::min(row,          img->height) - 1;
  int c1 = std::min(col,          img->width)  - 1;
  int r2 = std::min(row + rows,   img->height) - 1;
  int c2 = std::min(col + cols,   img->width)  - 1;

  int r1_step = r1 * step;
  int r2_step = r2 * step;

  float A = data[r1_step + c1];
  float B = data[r1_step + c2];
  float C = data[r2_step + c1];
  float D = data[r2_step + c2];

  return std::max(0.0f, A - B - C + D);
}

ronag 2010-09-08 14:12:33

Answer 4

A:

I am not sure if your problem lends itself to SIMD but this could potentially allow you to perform multiple operations on your image at once and give you a good performance improvement. I am assuming you are inlining and optimizing because you are performing the operation multiple times. Take a look at:

Compiler do have some support for Neon if the correct flags are enabled but you will probably need to roll out some on your own.

Edit To get compiler support for neon you will need to use the compiler flag -mfpu=neon

doron 2010-09-08 14:29:10

Are there any compiler flags to enable Neon support explicitly?

Diego 2010-09-08 15:25:13

@Diego - see edit

doron 2010-09-08 16:08:21

Answer 5

+1 A:

You aren't interested in four variables A, B, C, D, but only the combination A - B - C + D.

Try

float result(0.0f);
if (r1 >= 0 && c1 >= 0) result += data[r1 * step + c1];
if (r1 >= 0 && c2 >= 0) result -= data[r1 * step + c2];
if (r2 >= 0 && c1 >= 0) result -= data[r2 * step + c1];
if (r2 >= 0 && c2 >= 0) result += data[r2 * step + c2];

if (result > 0f) return result;
return 0f;

Ben Voigt 2010-09-08 14:36:02

`if (result > 0f)`.

Steve Jessop 2010-09-08 14:59:49

@Steve: You're right of course... I was remembering that the `std::max` function is used to establish a minimum, and from there on out all my reasoning was backwards.

Ben Voigt 2010-09-08 19:17:47

+1. Saves registers, good idea.

MSalters 2010-09-09 09:38:27

Answer 6

A:

Some of the examples say to initialize A, B, C and D directly and skip the initialization with 0, but this is functionally different than your original code in some ways. I would do this however:

inline float BoxIntegral(IplImage *img, int row, int col, int rows, int cols)  {

    const float *data = (float *) img->imageData;
    const int step = img->widthStep/sizeof(float);

    // The subtraction by one for row/col is because row/col is inclusive.
    const int r1 = std::min(row,          img->height) - 1;
    const int r2 = std::min(row + rows,   img->height) - 1;
    const int c1 = std::min(col,          img->width)  - 1;
    const int c2 = std::min(col + cols,   img->width)  - 1;

    const float A = (r1 >= 0 && c1 >= 0) ? data[r1 * step + c1] : 0.0f;
    const float B = (r1 >= 0 && c2 >= 0) ? data[r1 * step + c2] : 0.0f;
    const float C = (r2 >= 0 && c1 >= 0) ? data[r2 * step + c1] : 0.0f;
    const float D = (r2 >= 0 && c2 >= 0) ? data[r2 * step + c2] : 0.0f;

    return std::max(0.f, A - B - C + D);
}

like your original code, this will make A, B, C and D have a value either from data[] if the condition is true or 0.0f if the condition is false. Also, I would (as I have shown) use const wherever it is appropriate. Many compilers aren't able to improve code much based on const-ness, but it certainly can't hurt to give the compiler more information about the data it is operating on. Finally I have reordered the r1/r2/c1/c2 variables to encourage reuse of the fetched width and height.

Obviously you would need to profile to determine if any of this is actually an improvement.

Evan Teran 2010-09-08 14:37:01

ansaurus

tags:

views:

answers:

Optimizing C++ code for performance

related questions