tags:

views:

76

answers:

2

Hi.

I have the code:

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

mu_x_ptr = _aligned_malloc(4*sizeof(float), 16);
mm_mu_x = (__m128*) mu_x_ptr;
for(row = 0; row < ker_size; row++) {
    tmp = (__m128*) &original[row*width + col];
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}

From this I get:

First-chance exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
Unhandled exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
The program '[4452] SSIM.exe: Native' has exited with code -1073741819 (0xc0000005)

when running the program, the error occurs at the _mm_add_ps line.

original is allocated using _aligned_malloc(..., 16); as well and passed to the function, so it shouldn't, as far as my understanding of sse goes, be that it's not alligned.

I'm wondering if anyone can see why this crashes, since I can't see why.

EDIT: Width and col is always multiples of 4. Col is 0 or 4, while width is always a multiple of 4.

EDIT2: Looks like my original array is not aligned. Wouldn't:

function(float *original);
.
.
.
    orignal = _aligned_malloc(width*height*sizeof(float), 16);
    function(original);
    _aligned_free(original);
}

Make sure that original is alligned inside of function?

Edit3: This is actually really weird. When I do:

float *orig;
orig = _aligned_malloc(width*height*sizeof(float), 16);
assert(isAligned(orig));

The assert fails with

#define isAligned(p) (((unsigned long)(p)) & 15 == 0)
+1  A: 

tmp will be misaligned unless width and col have suitable values. Ideally both width and col should be multiples of 4.

You might want to add some asserts to check the alignment, e.g.

#define IsAligned(p) ((((unsigned long)(p)) & 15) == 0)

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

assert(original != NULL && IsAligned(original));
mu_x_ptr = _aligned_malloc(4 * sizeof(float), 16);
assert(mu_x_ptr != NULL && IsAligned(mu_x_ptr));
mm_mu_x = (__m128 *)mu_x_ptr;
assert(IsAligned(mm_mu_x));
for (row = 0; row < ker_size; row++)
{
    tmp = (__m128 *)&original[row * width + col];
    assert(IsAligned(tmp));
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}
Paul R
Sorry, forgot to mention. col is either 0 or 4, and width is always a multiple of 4. Editing it now.
martiert
Sorry - our edits clashed - I was fixing the formatting etc - you might want to add the part about col and width again.
Paul R
Looks like my edit worked either way.
martiert
Oh yes - missed that. Have now added further suggestion re asserts in my answer above.
Paul R
Hmmm... This fails after fetching the data. So the aligment have to be wrong then. But won't _aligned_malloc(width*height*sizeof(float), 16); allocate aligned memory, and then I just have to pass original normaly to the function so the function can be something like: function(float *original);?
martiert
+2  A: 

I think you need to use

__m128 tmp = _mm_load_ps( &original[row * width + col] );

instead of

tmp = (__m128 *)&original[row * width + col];

EDIT: If you get access violation errors are after some offset then possibly your stride is not aligned. Either way allocate __m128 elements(which represent 4 floats). This takes care of the alignment.

Also you can get some extra performance by eliminating the arithmetic [row * width + col]. Determine your stride and increment your pointer accordingly.

renick
This did it. I had a bad stride, so now I do my operations on four and four pixels. The tip about incrementing my pointer instead will also be implemented.
martiert
Strange, actually using the arithmetic is quite a lot faster then incrementing the pointer.
martiert
This is indeed strange. Are you incrementing by a constant ?
renick