views:

229

answers:

7

I am trying to optimize this sort of things in a heavy computing application:

say I have a

 double d[500][500][500][500];

and the following is quite costly at least from compiler perspective

double d[x][y][j][k]

I want to tell compiler is that it's contiguous memory, to facilitate computing the offset.

In my example,

I have something like this:

double n=0;
for (int i=0; i < someNumber; i++)
{
    n+=d[x][i][j][k] /*(some other math calculations)*/;
}

So I tried to optimize it by putting it in a separate function

void func( double*** const restrict dMatrix )
{
  /* and do some calculations herel*/

}

didn't help much :(

Any suggestions on optimizing it?

}

Edit

I cannot rewrite the code to make the array one-dimensional. I have to work with this multidimensional beast :(

+10  A: 

I suspect that the problem is not the offset calculation but the actual access to memory. When you declare a 4-dimensional array and access elements with adjacent indices at any level except the last one memory addresses are actually quite far from each other and this leads to lots of cache misses and significant slowdown.

sharptooth
timday
+5  A: 

Note that this is a lot of (around 466 GB, if my math is right) data, and beware of swap and cache access issues. If you're not actually using 500^4 elements, you need to profile your application to see that it's really the "indirection" that is costing you, performance-wise.

unwind
You can still do d[500][500][500][500] and access it your way. It makes no difference.
Pod
Your suggestion is exactly the same as the fake multi-dimensional array of C.
Georg
Removed the suggestion about explicit 1-dimensionality, since a[2][2] and a[2 * 2] are equivalent. Sorry for the confusion.
unwind
+5  A: 

The c compiler certainly knows when the memory is contiguous. You don't have to tell it.

Thomas Danecker
+3  A: 

There are no multi-dimensional arrays in C. All arrays are 1-dimensional, the compiler just computes the correct offset. This means you can't make it faster by calculating the offset yourself. This is a limitation of the C language.

You can probably speed it up by reducing the amount of cache misses. a[0][?][?][?] is probably far from a[1][?][?][?].

Georg
+3  A: 

As mentioned elsewhere, the memory is contiguous anyway, and the slowness comes from cache misses. To cut down on this, you want to ensure (if possible) that you are iterating over adjacent elements for maximum cache coherence rather than making large jumps in memory. In C I believe this means that your most commonly iterated value should be the last dimension of the array, and least commonly iterated should be the first dimension: see the Wikipedia article.

Kylotan
A: 

what you used to be able to do to speed things up was to use incremental pointers to speed up accessing the array.

so using a simple array.

char aString[500];
for (int i=0; i<500; i++)
    aString[i] = 0;     // array access is really a multiply!

becomes

char aString[500];
char *aStringPtr;
for (aStringPtr= &aString[0] ; aStringPtr<&aString[0]+500; aStringPtr++)
    *aStringPtr = 0;

This runs about twice as fast as the first example.

Tony Lambert
Why does it run twice as fast?
vehomzzz
I think it's because in the 2nd example your just incrementing an address in the first your incrementing a value and adding it to an address.
Tony Lambert
I don't thing that there should be a performance difference when optimization is turned on...
Thomas Danecker
A: 

As unwind said, your array is about half a terabyte. You need a disk big enough, and a pagefile big enough. Then, you probably need RAM also extremely large. Finally, your cache size also matters, so the order you access elements will make a huge difference. Address calculation will be in the noise.

If this is actually a sparse array, you should treat it as such. In fact, organizing it with pointer arrays might be a good way to do that.

As it is, simply loading data into that array, if done in the fastest possible way, might take hours.

BTW, I hope you are on a 64-bit machine. A 32-bit address can only access about 4GB.

Mike Dunlavey