ansaurus

Question

Unknown GCC error, while compiling for ARM NEON (Critical)

Answer 1

+2 A:

I can't test this because I don't have the toolchain for it, but this type of error can often be worked around by rewording the code a little bit. Generally this shouldn't happen, and it should be reported as a bug, but you are using processor specific functionality, which is probably less well tested and polished than the rest of the compiler.

Since it is a register spill error and you've got several pointers involved I highly suspect that the compiler may be trying to load more data into registers than it needs to out of fear that there may be some aliasing going on (which probably isn't actually happening). Below I will deal with the possibility of that as well as do a few other things which may lessen the complexity of the code from the compiler's perspective (though it might not look like this is the case).

#include<stdio.h>
#include"arm_neon.h"

#define IMAGE_HEIGHT 480
#define IMAGE_WIDTH  640

float32_t integral_image[IMAGE_HEIGHT][IMAGE_WIDTH];

float32x4_t box_area_compute3(int, int , int , int , unsigned int , float);

inline int min(int, int);

int main()
{

 box_area_compute3(1, 1, 4, 4, 2, 0);

 return 0;
}

/* By putting these in separate functions the compiler will initially
 * think about them by themselves, without the complications of the
 * surrounding code.  This may give it the abiltiy to optimise the
 * code somewhat before trying to inline it.
 * This may also serve to make it more obvious to the compiler that
 * the local variables are dead after their use (since they are
 * dead after the call returns, and that the lifetimes of some variable
 * cannot actually overlap (hopefully reducing the register needs).
 */
static inline float32x4_t do_it2(float32_t *tl, float32_t *tr, float32_t *bl, float32_t * br) {
    float32x4x2_t top_left, top_right, bottom_left, bottom_right;
    float32x4_t A, B;

    top_left = vld2q_f32(tl);
    top_right = vld2q_f32(tr);
    bottom_left = vld2q_f32(bl);
    bottom_right = vld2q_f32(br);

    /* By spreading this across several statements I have created several
     * additional sequence points.  The compiler does not think that it
     * has to dereference all of the pointers before doing any of the
     * computations.... maybe. */
    A = vaddq_f32(*top_left.val, *bottom_right.val);
    B = vsubq_f32(A, *top_right.val);
    return vsubq_f32(B, *bottom_left);
}

static inline float32x4_t do_it4(float32_t *tl, float32_t *tr, float32_t *bl, float32_t * br) {
    float32x4x4_t top_left, top_right, bottom_left, bottom_right;
    float32x4_t A, B;

    top_left = vld4q_f32(tl);
    top_right = vld4q_f32(tr);
    bottom_left = vld4q_f32(bl);
    bottom_right = vld4q_f32(br);

    A = vaddq_f32(*top_left.val, *bottom_right.val);
    B = vsubq_f32(A, *top_right.val);
    return vsubq_f32(B, *bottom_left);
}

float32x4_t box_area_compute3(int row, int col, int num_rows, int num_cols, unsigned int step_size, float three)
{
 unsigned int height = IMAGE_HEIGHT;
 unsigned int width = IMAGE_WIDTH;

 int temp_row = row + num_rows;
 int temp_col = col + num_cols;

 int r1 = (min(row, height))- 1 ;
 int r2 = (min(temp_row, height)) - 1;

 int c1 = (min(col, width)) - 1;
 int c2 = (min(temp_col, width)) - 1;

 float32x4_t v128_areas;

     float32_t *tl = (float32_t *)integral_image[r1] + c1;
 float32_t *tr = (float32_t *)integral_image[r1] + c2;
 float32_t *bl = (float32_t *)integral_image[r2] + c1;
 float32_t *br = (float32_t *)integral_image[r2] + c2;


 switch (step_size) {
    case 2:
      v128_areas = do_it2(tl, tr, bl, br);
      break;

 case 4:
      v128_areas = do_it4(tl, tr, bl, br);
      break;
 }

 if(three == 3.0)
  v128_areas = vmulq_n_f32(v128_areas, three);

 return v128_areas;

}

inline int min(int X, int Y)
{
 return (X < Y ? X : Y);
}

I hope that this helps and that I did not introduce any errors.

nategoose 2010-09-28 16:22:47

Thanks for your all your efforts, but I feel sad to say that once again, I get the same error :(. I was under the same impression as you my friend, I had split the functionality, but the damn error doesn't seem to go. This is happening only to the second part (do_it4()), when we look carefully at the error report, it points subtly to neon_vld4qav4sf. :( I have reported this to Code Sourcery, hope they fix this issue and release a patch, SOON, I'm really running out of my time! :)

vikramtheone 2010-09-28 19:16:49

@vikramtheone: What happens if you remove `-mfloat-abi=hard` from the command line? I have no idea what effect this might have on your chip.

nategoose 2010-09-28 19:31:51

@nategoose,I have taken a different route, I wrote the do_it4(..) in assembly :), passed it to the assembler generated an object file, passed it to the linker and it works well. I feel good that this problem occurred(btw, the Codesourcery asked me to bring down the optimization flags from -O3 down to -O1, so it is a bug in gcc), by writing in assembly and disassembling the obj file, I saw that I have achieved the same results in far more denser code. And yes about the -mfloat-abi=hard thing, I tried soft and I was still getting the errors. I will post the assembly thing as the answer shortly.

vikramtheone 2010-09-30 06:25:25

Answer 2

A:

Well I had contacted the Code Sourcery about this problem and they have considered this as a bug in GCC compiler. So I wrote the do_it4(){.....} function in assembly instead of using teh intrinsics. Now it works good!

vikramtheone 2010-10-28 09:18:05

ansaurus

tags:

views:

answers:

Unknown GCC error, while compiling for ARM NEON (Critical)

related questions