tags:

views:

372

answers:

6
+2  Q: 

How do I optimize

What I'm trying to do is take this code:

char naive_smooth_descr[] = "naive_smooth: Naive baseline implementation";

void naive_smooth(int dim, pixel *src, pixel *dst) 

{

    int i, j;

    for (i = 0; i < dim; i++)
    for (j = 0; j < dim; j++)
        dst[RIDX(i, j, dim)] = avg(dim, i, j, src);
}

and replace the function call avg(dim, i, j, src); with the actual code at the very bottom of the page. Then, take that code and replace all the function calls in that code with the the actual code, etc.

If you're asking why do all this, the reason is simple: when you get rid of function calls the program runs faster, and I'm trying to attain the fastest cycles per element when the above code runs by getting rid of all the function calls and replacing it with the actual code.

Now I'm really just having a lot of trouble doing this. Do I take the code with the brackets and then just copy and paste? Do I leave out the brackets? Do I include the beginning of the code, for example, static pixel avg(int dim, int i, int j, pixel *src) and then the brackets and then the code to replace the function call?

I am going to paste all the code here:

/* A struct used to compute averaged pixel value */

typedef struct {

    int red;
    int green;
    int blue;
    int num;

}  pixel_sum;

/* Compute min and max of two integers, respectively */


static int min(int a, int b) { return (a < b ? a : b); }

static int max(int a, int b) { return (a > b ? a : b); }



/* 
 * initialize_ pixel_ sum - Initializes all fields of sum to 0 
 */


static void initialize_ pixel_ sum (pixel_sum *sum) 

{

    sum->red = sum->green = sum->blue = 0;
    sum->num = 0;
    return;

}

/* 
 * accumulate_sum - Accumulates field values of p in corresponding 
 * fields of sum 
 */

static void accumulate_ sum (pixel_sum *sum, pixel p) 

{

    sum->red += (int) p.red;
    sum->green += (int) p.green;
    sum->blue += (int) p.blue;
    sum->num++;
    return;

}


/* 
 * assign_ sum_ to_ pixel - Computes averaged pixel value in current_pixel 
 */

static void assign_ sum_ to_ pixel (pixel *current_ pixel, pixel_ sum sum) 

{

    current_pixel->red = (unsigned short) (sum.red/sum.num);
    current_pixel->green = (unsigned short) (sum.green/sum.num);
    current_pixel->blue = (unsigned short) (sum.blue/sum.num);
    return;

}

/* 
 * avg - Returns averaged pixel value at (i,j) 
 */

This is the code that I want to replace the function call avg(dim, i, j, src); with:

static pixel avg (int dim, int i, int j, pixel *src) 

{

    int ii, jj;
    pixel_sum sum;
    pixel current_pixel;

    initialize_pixel_sum(&sum);
    for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++) 
    for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++) 
         accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);

    assign_sum_to_pixel(&current_pixel, sum);
    return current_pixel;


}
+7  A: 

If your code base is small, includes like 10-12 functions, you might want to try having the keyword inline in front of each of the functions.

Second option, use a compiler option that inlines all the function calls, don't do it manually (that is why compilers exist). What compiler are you using? You can look online for its option that inlines all function calls (if it has any).

Third, if you are using GCC for compiling your code, you can specify the always_inline attribute for the function. Here is how to use it:

static pixel avg (int dim, int i, int j, pixel *src) __attribute__((always_inline));
Ashwin
+2  A: 

Use inline and macros: http://gcc.gnu.org/onlinedocs/cpp/Macros.html

Andrejs Cainikovs
+4  A: 
  1. If you are using a C99 compiler or a C++ compiler, you can use inline keyword. However, it won't guarantee that the call will be replaced with actual code, only if the compiler deems it to be more efficient.
  2. Otherwise, if you are using pure C89, then avg() has to be a macro. Then you are guaranteed to have the function "call" replaced with the actual code.
Alex B
+2  A: 

I have to say I agree with the approach of making sure you're using compiler optimizations and inline... but if you still want an answer to your specific question, I think what you're getting at is something like:

for (j = 0; j < dim; j++)
{

    /* ...avg() code body except for the return... */ 

    dst[RIDX(i, j, dim)] = current_pixel;
}
Kilo
actuall...this is exactly what I'm looking for...now do I basically do the same with the rest of the functions inside that avg() code body?
I'm Jim Caviezel too
I posted code at the bottom to see if it is the right code...
I'm Jim Caviezel too
A: 

/* * mysmooth - my smooth */

char mysmooth_ descr[] = "my smooth: My smooth";

void mysmooth (int dim, pixel *src, pixel *dst)

{

int i, j;
int ii, jj;
pixel_sum sum;
pixel current_pixel;

for (i = 0; i < dim; i++)
for (j = 0; j < dim; j++)
{
initialize_pixel_sum(&sum);
for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++) 
for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++) 
    accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);

assign_sum_to_pixel(&current_pixel, sum);
dst[RIDX(i, j, dim)] = current_pixel;

}

So Is this what my code should look like after I finish taking the code from avg() and replacing it with the function?

I'm Jim Caviezel too
This is not an answer. You should edit your post instead and delete this non-answer.
Tamás Szelei
A: 

I unrolled the beginning and the end of the cycles to eliminate min() and max() from the code:

void smooth_B(int dim, struct pixel src[dim][dim], struct pixel dst[dim][dim]){
  dst[0][0].red  =(src[0][0].red  +src[1][0].red  +src[0][1].red  +src[1][1].red  )/4;
  dst[0][0].green=(src[0][0].green+src[1][0].green+src[0][1].green+src[1][1].green)/4;
  dst[0][0].blue =(src[0][0].blue +src[1][0].blue +src[0][1].blue +src[1][1].blue )/4;
  for( int j=1; j<dim-1; j++){
    dst[0][j].red  =(src[0][j-1].red  +src[1][j-1].red  +src[0][j].red  +src[1][j].red  +src[0][j+1].red  +src[1][j+1].red  )/6;
    dst[0][j].green=(src[0][j-1].green+src[1][j-1].green+src[0][j].green+src[1][j].green+src[0][j+1].green+src[1][j+1].green)/6;
    dst[0][j].blue =(src[0][j-1].blue +src[1][j-1].blue +src[0][j].blue +src[1][j].blue +src[0][j+1].blue +src[1][j+1].blue )/6;
  }
  dst[0][dim-1].red  =(src[0][dim-2].red  +src[1][dim-2].red  +src[0][dim-1].red  +src[1][dim-1].red  )/4;
  dst[0][dim-1].green=(src[0][dim-2].green+src[1][dim-2].green+src[0][dim-1].green+src[1][dim-1].green)/4;
  dst[0][dim-1].blue =(src[0][dim-2].blue +src[1][dim-2].blue +src[0][dim-1].blue +src[1][dim-1].blue )/4;

  for( int i=1; i<dim-1; i++){
    dst[i][0].red  =(src[i-1][0].red  +src[i-1][1].red  +src[i][0].red  +src[i][1].red  +src[i+1][0].red  +src[i+1][1].red  )/6;
    dst[i][0].green=(src[i-1][0].green+src[i-1][1].green+src[i][0].green+src[i][1].green+src[i+1][0].green+src[i+1][1].green)/6;
    dst[i][0].blue =(src[i-1][0].blue +src[i-1][1].blue +src[i][0].blue +src[i][1].blue +src[i+1][0].blue +src[i+1][1].blue )/6;
    for( int j=1; j<dim; j++){
      dst[i][j].red  =(src[i-1][j-1].red  +src[i][j-1].red  +src[i+1][j-1].red  +src[i-1][j].red  +src[i][j].red  +src[i+1][j].red  +src[i-1][j+1].red  +src[i][j+1].red  +src[i+1][j+1].red  )/9;
      dst[i][j].green=(src[i-1][j-1].green+src[i][j-1].green+src[i+1][j-1].green+src[i-1][j].green+src[i][j].green+src[i+1][j].green+src[i-1][j+1].green+src[i][j+1].green+src[i+1][j+1].green)/9;
      dst[i][j].blue =(src[i-1][j-1].blue +src[i][j-1].blue +src[i+1][j-1].blue +src[i-1][j].blue +src[i][j].blue +src[i+1][j].blue +src[i-1][j+1].blue +src[i][j+1].blue +src[i+1][j+1].blue )/9;
    }
    dst[i][dim-1].red  =(src[i-1][dim-2].red  +src[i][dim-2].red  +src[i+1][dim-2].red  +src[i-1][dim-1].red  +src[i][dim-1].red  +src[i+1][dim-1].red  )/6;
    dst[i][dim-1].green=(src[i-1][dim-2].green+src[i][dim-2].green+src[i+1][dim-2].green+src[i-1][dim-1].green+src[i][dim-1].green+src[i+1][dim-1].green)/6;
    dst[i][dim-1].blue =(src[i-1][dim-2].blue +src[i][dim-2].blue +src[i+1][dim-2].blue +src[i-1][dim-1].blue +src[i][dim-1].blue +src[i+1][dim-1].blue )/6;
  }
  dst[dim-1][0].red  =(src[dim-2][0].red  +src[dim-2][1].red  +src[dim-1][0].red  +src[dim-1][1].red  )/4;
  dst[dim-1][0].green=(src[dim-2][0].green+src[dim-2][1].green+src[dim-1][0].green+src[dim-1][1].green)/4;
  dst[dim-1][0].blue =(src[dim-2][0].blue +src[dim-2][1].blue +src[dim-1][0].blue +src[dim-1][1].blue )/4;
  for( int j=1; j<dim; j++){
    dst[dim-1][j].red  =(src[dim-2][j-1].red  +src[dim-1][j-1].red  +src[dim-2][j].red  +src[dim-1][j].red  +src[dim-2][j+1].red  +src[dim-1][j+1].red  )/6;
    dst[dim-1][j].green=(src[dim-2][j-1].green+src[dim-1][j-1].green+src[dim-2][j].green+src[dim-1][j].green+src[dim-2][j+1].green+src[dim-1][j+1].green)/6;
    dst[dim-1][j].blue =(src[dim-2][j-1].blue +src[dim-1][j-1].blue +src[dim-2][j].blue +src[dim-1][j].blue +src[dim-2][j+1].blue +src[dim-1][j+1].blue )/6;
  }
  dst[dim-1][dim-1].red  =(src[dim-2][dim-2].red  +src[dim-1][dim-2].red  +src[dim-2][dim-1].red  +src[dim-1][dim-1].red  )/4;
  dst[dim-1][dim-1].green=(src[dim-2][dim-2].green+src[dim-1][dim-2].green+src[dim-2][dim-1].green+src[dim-1][dim-1].green)/4;
  dst[dim-1][dim-1].blue =(src[dim-2][dim-2].blue +src[dim-1][dim-2].blue +src[dim-2][dim-1].blue +src[dim-1][dim-1].blue )/4;
}

As i measured it is faster by ~50% than the original code. The next step is the elimination of repeated calculations.

sambowry