views:

148

answers:

5

I have three questions about three nested loops:

for (int x=0; x<400; x++)
{
    for (int y=0; y<300; y++)
    {
        for (int z=0; z<400; z++)
        {
             // compute and store value
        }
    }
}

And I need to store all computed values. My standard approach would be to use a 3D-array:

values[x][y][z] = 1; // test value

but this turns out to be slow: it takes 192 ms to complete this loop, where a single int-assignment

int value = 1; // test value

takes only 66 ms.

1) Why is an array so relatively slow?
2) And why does it get even slower when I put this in the inner loop:

values[z][y][x] = 1; // (notice x and z switched)

This takes more than 4 seconds!

3) Most importantly: Can I use a data structure that is as quick as the assignment of a single integer, but can store as much data as the 3D-array?

+1  A: 

I would guess that this has alot to do with caching and registers and the principle of memory locality.

Java has to access thousands of more bytes of memory when storing into an array. With the single variable, it can just keep that value in the cache and just keep updating it.

The cache isn't large enough to hold the whole multi-dimensional array, so Java has to keep updating the cache to and from memory. Cache access times are way faster than memory access times.

I don't even see why you would do this test though. If you need to store lots of data in a multi-dimensional array, using a single variable isn't helpful, even if it is faster.

Also, the reason that when the parameters are switched around when accessing the array is because you are jumping around in memory a lot more (lots more cache misses) than when you are just iterating the other way.

jjnguy
+1 for "I don't even see why you would do this test" -- this submarine is much faster than this parachute, so which one should I use to safely fall from an airplane?
Jim Kiley
Haha thanks @Jim
jjnguy
I don't agree. I used to use a 4D-array (int[400][300][400][3]) to store rgb values. But now I know this is slower than using three different int[400][300][400] arrays. In a language like PHP I would even consider making 400x300x400 different single integers and calling them dynamically, if that's faster - and possible memorywise.
RemiX
This isn't something you can agree or disagree with. It isn't an opinion. Arrays are contiguous blocks of memory. This doesn't change in multi-dimensional arrays. Changing from having a 4-dimensional array to 3, 3-dimensional arrays shouldn't change the access time of the data in those arrays.
jjnguy
"In a language like PHP I would even consider making 400x300x400 different single integers and calling them dynamically, if that's faster" What! (@RemiX)
jjnguy
+2  A: 
public static void main( String[] args ) {

    int[][][] storage = new int[ 400 ][ 300 ][ 400 ];
    long start = System.currentTimeMillis();

    for ( int x = 0; x < 400; x++ ) {
        for ( int y = 0; y < 300; y++ ) {
            for ( int z = 0; z < 400; z++ ) {
                storage[x][y][z] = 5;
            }
        }
    }

    long end = System.currentTimeMillis();
    System.out.println( "Time was: " + ( end - start ) / 1000.0 + " seconds." );


}

Ran with -Xmx1g

Time was: 0.188 seconds.

That seems pretty darn fast.. you are looking at 48 MILLION elements in the innermost loop.

Homerolling a silly little datastructure..

public static void main( String[] args ) {

    StorerGuy[] storerGuys = new StorerGuy[ 400 ];

    long start = System.currentTimeMillis();

    for ( int x = 0; x < 400; x++ ) {
        for ( int y = 0; y < 300; y++ ) {
            for ( int z = 0; z < 400; z++ ) {
                storerGuys[x] = new StorerGuy( x, y, z, 5 );

            }
        }
    }

    long end = System.currentTimeMillis();
    System.out.println( "Time was: " + ( end - start ) / 1000.0 + " seconds." );

}

public static class StorerGuy {

    public int x;
    public int y;
    public int z;
    public int value;

    StorerGuy( int x, int y, int z, int value ) {
        this.x = x;
        this.y = y;
        this.z = z;
        this.value = value;
    }

}

Time was: 0.925 seconds.

Which is faster than 4 seconds you had in your mixed up order example.

I think multi arrays are too much for the problem. You are better off with a more complex data structure, as it will keep the stuff all in 1 memory location (x,y,z,value).

Java is an OO language. In most cases, you should use objects and not weird data structures like int[][][]

bwawok
So as long as I don't have to switch the indices, the int[][][] approach is faster.
RemiX
A: 

Considering that the array is huge, the amount of memory used, the indirections needed (a multidimensional array is a arrays of reference to arrays...), that does not seem at all slow to me. When you switch x and z you are probably trashing the cache.

For comparison, you could store everything in a flat array.... This would improve the storing speed... but then the retrieval would be more complex and much more slow.

int k = 0;
for (int x=0; x<400; x++)
{
    for (int y=0; y<300; y++)
    {
        for (int z=0; z<400; z++)
        {
             // compute and store value
             arr[k++] = val;
        }
    }
}
leonbloy
Flat array will be just as fast as a multi-dimensional iterated in the correct order.
jjnguy
True, I say slow, but I just compared it to something faster ;)Unfortunately, the flat array seems slower:values[x*400*300+y*300+d] = 1); took 300 ms
RemiX
@jjnguy, boundary checks may be less (only 1 instead of 3) for the one-flat-array version. May be optimized away by JIT, not sure.
tucuxi
it's slower if you do the calculation (multiplies and sums), it's not if you keep an increasing index, as I wrote. But jinguy is right, this is about as fast as the standard multi-dimensional
leonbloy
did the experiment, posted times above - doing the math on a flat array seems about 10% faster than the triple-nested array (it is highly possible that JIT optimizes some of the math out of the most internal loop). Of course, in a real application, access times will probably be dwarfed by the actual calculations...
tucuxi
A: 

Have you tried this:

Object[][][] store = new Object[ 400 ][300][400];

for (int x=0; x<400; x++)
{
    Object[][] matrix = store[x];

    for (int y=0; y<300; y++)
    {
        Object[] line = matrix[y];
        for (int z=0; z<400; z++)
        {
             // compute and store value
             line[z] = // result;
        }
    }
}

it might improve your cache thrashing.

Toader Mihai Claudiu
Right now, I get an OutOfMemoryError and I need to figure out how to increase the heap space, but I'll keep your suggestion in mind for when I do.
RemiX
java -Xmx=512m :)
Toader Mihai Claudiu
+1  A: 

1) Why is an array so relatively slow?

As others pointed, you are comparing apples to oranges. The triple-array is slow because it needs to dereference (internally at least - yes, "there are no pointers in Java") three times; but then again, you cannot reference a single integer variable...

2) And why does it get even slower when I put this in the inner loop:

values[z][y][x] = 1; // (notice x and z switched)

Because you have decreased cache coherence. The fastest-changing indices should be the last ones, so that most memory accesses occur next to each other, within the same cache blocks, instead of forcing your processor to wait until the blocks are read from the main RAM.

3) Most importantly: Can I use a data structure that is as quick as the assignment of a single integer, but can store as much data as the 3D-array?

No. There is no such structure, since the integer variable fits into a machine register (quicker even than the processor's memory cache), and can always be accessed faster than anything else you care to mention. Processor speeds are much, much faster then main memory speeds. If your 'working set' (the data that you need to operate on) does not fit into registers or cache, you will have to pay a penalty to fetch it from RAM (or even worse, disk).

This being said, Java does boundary checks on each array access, and does not seem to be too smart about optimizing the boundary checks away. The following comparison may be of interest:

public static long test1(int[][][] array) {
    long start = System.currentTimeMillis();
    for ( int x = 0; x < 400; x++ ) {
        for ( int y = 0; y < 300; y++ ) {
            for ( int z = 0; z < 400; z++ ) {
                array[x][y][z] = x + y + z;
            }
        }
    }
    return System.currentTimeMillis() - start;
}

public static long test2(int [] array) {
    long start = System.currentTimeMillis();
    for ( int x = 0; x < 400; x++ ) {
        for ( int y = 0; y < 300; y++ ) {
            for ( int z = 0; z < 400; z++ ) {
                array[z + y*400 + x*400*300] = x + y + z;
            }
        }
    }
    return System.currentTimeMillis() - start;
}

public static void main(String[] args) {

    int[][][] a1 = new int[400][300][400];
    int[] a2 = new int[400*300*400];
    int n = 20;

    System.err.println("test1");
    for (int i=0; i<n; i++) {
        System.err.print(test1(a1) + "ms ");
    }
    System.err.println();
    System.err.println("test2");
    for (int i=0; i<n; i++) {
        System.err.print(test2(a2) + "ms ");
    }
    System.err.println();
}

The output, on my system, is

test1
164ms 177ms 148ms 149ms 148ms 147ms 150ms 151ms 152ms 154ms 151ms 150ms 148ms 148ms 150ms 148ms 150ms 148ms 148ms 149ms 
test2
141ms 153ms 130ms 130ms 130ms 133ms 130ms 130ms 130ms 132ms 129ms 131ms 130ms 131ms 131ms 130ms 131ms 130ms 130ms 130ms

Therefore, there is some room for improvement... but I really don't think it is worth your while.

tucuxi