views:

636

answers:

4

For example:

a) int [x][y][z] vs b) int[x*y*z]

Initially thought i'd go with a) for simplicity

I know that Java doesn't store arrays linearly in memory like C does. But what implications does this have for my program?

+1  A: 

If you choose the latter route then you're going to have to perform arithmetic for every single array access. That's going to be a pain and error prone (unless you wrap it in a class providing this functionality).

I don't believe that there's any (significant) optimisation in choosing your flat array (especially given the arithmetic taken to index into it). As always with optimisations, you would need to perform some measurements and determine if it's really worth it.

Brian Agnew
Ok, thanks. I'm just going to use a 3-dimensional array, and if I have performance issues with it do a comparison.
Mikolan
If you use a multidimensional array, then you're going to have to perform several memory accesses for every single array access, which could me *buch* slower than a little arithmetic. But yeah, with this kind of thing you really need to measure before you act.
Michael Borgwardt
+3  A: 

Use first variant (3-dimensional) because it's easier for understanding and there are less chances to make some logical error (especially if you're using it for modeling 3-dimensional space)

Roman
+7  A: 

Usually the best thing to do when searching anwers for such questions is to see how the choices are compiled into JVM bytecode:

multi = new int[50][50];
single = new int[2500];

this is translated into

BIPUSH 50
BIPUSH 50
MULTIANEWARRAY int[][] 2
ASTORE 1
SIPUSH 2500
NEWARRAY T_INT
ASTORE 2

so as you can see the JVM already knows that we are talking about a multi dimensional array..

keeping it further:

for (int i = 0; i < 50; ++i)
    for (int j = 0; j < 50; ++j)
    {
        multi[i][j] = 20;
        single[i*50+j] = 20;
    }

this is translated (skipping the cycles) into:

ALOAD 1: multi
ILOAD 3: i
AALOAD
ILOAD 4: j
BIPUSH 20
IASTORE

ALOAD 2: single
ILOAD 3: i
BIPUSH 50
IMUL
ILOAD 4: j
IADD
BIPUSH 20
IASTORE

So as you can see the multi-dimensional array is treated internally into the VM, no overhead generated by useless instructions, while using a sigle one uses more instructions since offset is calculated by hand.

I don't think that performance will be such an issue..

EDIT:

I did some simple benchmarks to see what's going down here. I chose try different examples: linear read, linear write and random access. Time are expressed in millisecs (and calculated using System.nanoTime().. here are the results:

Linear write

  • Size: 100x100 (10000) Multi: 5.786591 Single: 6.131748
  • Size: 200x200 (40000) Multi: 1.216366 Single: 0.782041
  • Size: 500x500 (250000) Multi: 7.177029 Single: 3.667017
  • Size: 1000x1000 (1000000) Multi: 30.508131 Single: 18.064592
  • Size: 2000x2000 (4000000) Multi: 185.3548 Single: 155.590313
  • Size: 5000x5000 (25000000) Multi: 955.5299 Single: 923.264417
  • Size: 10000x10000 (100000000) Multi: 4084.798753 Single: 4015.448829

Linear read

  • Size: 100x100 (10000) Multi: 5.241338 Single: 5.135957
  • Size: 200x200 (40000) Multi: 0.080209 Single: 0.044371
  • Size: 500x500 (250000) Multi: 0.088742 Single: 0.084476
  • Size: 1000x1000 (1000000) Multi: 0.232095 Single: 0.167671
  • Size: 2000x2000 (4000000) Multi: 0.481683 Single: 0.33321
  • Size: 5000x5000 (25000000) Multi: 1.222339 Single: 0.828118 Size: 10000x10000 (100000000) Multi: 2.496302 Single: 1.650691

Random read

  • Size: 100x100 (10000) Multi: 22.317393 Single: 8.546134
  • Size: 200x200 (40000) Multi: 32.287669 Single: 11.022383
  • Size: 500x500 (250000) Multi: 189.542751 Single: 68.181343
  • Size: 1000x1000 (1000000) Multi: 1124.78609 Single: 272.235584
  • Size: 2000x2000 (4000000) Multi: 6814.477101 Single: 1091.998395
  • Size: 5000x5000 (25000000) Multi: 50051.306239 Single: 7028.422262

But random one is a little bit misleading since it generates 2 random numbers for multi-dimensional array while just one for single dimensional (and PNRGs may consume some CPU).

Mind that I tried to let JIT work by benchmarking only after the 20th run of the same loop. For completeness my java VM is the following:

java version "1.6.0_17" Java(TM) SE Runtime Environment (build 1.6.0_17-b04) Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

Jack
Always nice to see someone look at the reality under the hood instead of just making assumptions. I'd give you +100 if I could.
JUST MY correct OPINION
By the time the code is jitted, the number of JVM instructions is irrelevant. What matters is how much actual time the code takes to run, which will depend on things like locality, dereferencing, and memory usage.
Gabe
Please update the random read benchmark so that it generates 2 random numbers for both versions. Probably the single-array version will even then be faster, because of requiring less memory lookups (random read will produce the most cache misses), but you can never be sure before measuring it.
Esko Luontola
In the first part of your message you conclude that the bytecodes are similar and that there will be no performance difference, but then the benchmarks in the latter part of your message prove your original assumption wrong. That reinforces the idea that "premature optimization is the root of all evil", because trying to guess performance rarely works. :)I added benchmarks for 3-dimensional arrays to my answer, and also took into consideration the overhead of generating random numbers.
Esko Luontola
Actually, from the bytecodes you showed, it's possible to see that the multidimensional array can be slower: It requires 2 heap accesses (AALOAD and IASTORE) whereas the singledimensional version requires only 1 heap access (IASTORE). All the other instructions operate on the values on stack (which will be in the cache or registers) or do arithmetics, so they are very fast.
Esko Luontola
+3  A: 

On current CPUs, non-cached memory access is hundreds of times slower than arithmetics (see this presentation). The a) option will result in about 3 memory lookups whereas the b) option will result in about 1 memory lookup. So the b) option can be faster in some situations (it's a hot spot and the array does not fit into the CPU's cache). How much faster? - that will depend on the application.

Personally I would first use the a) option, because it will result in simpler code. If a profiler shows that array access to be a bottleneck, then I would convert it to the b) option, so that there is a pair of helper methods for reading and writing array values (that way the messy code will be restricted to those two methods).

UPDATE:

I made a benchmark for comparing 3-dimensional int arrays ("Multi" column) to the equivalent 1-dimensional int arrays ("Single" column). The code is here and tests here. I ran it on 64-bit jdk1.6.0_18, Windows 7 x64, Core 2 Quad Q6600 @ 3.0 GHz, 4 GB DDR2, using the JVM options -server -Xmx3G -verbose:gc -XX:+PrintCompilation (I have removed the debug output from the following results). The results were:

Out of 20 repeats, the minimum time in milliseconds is reported.

Array dimensions: 100x100x100 (1000000)
            Multi   Single
Seq Write   1       1
Seq Read    1       1
Random Read 99      90    (of which generating random numbers 59 ms)

Array dimensions: 200x200x200 (8000000)
            Multi   Single
Seq Write   14      13
Seq Read    11      8
Random Read 1482    1239    (of which generating random numbers 474 ms)

Array dimensions: 300x300x300 (27000000)
            Multi   Single
Seq Write   53      46
Seq Read    34      24
Random Read 5915    4418    (of which generating random numbers 1557 ms)

Array dimensions: 400x400x400 (64000000)
            Multi   Single
Seq Write   123     111
Seq Read    71      55
Random Read 16326   11144    (of which generating random numbers 3693 ms)

This shows the 1-dimensional array to be faster. Probably the reason is that the 3-dimensional array requires more memory access.

I did also some measurements to estimate the overhead of generating the random numbers in the Random Read benchmark by replacing preventOptimizingAway += array.get(x, y, z); with preventOptimizingAway += x * y * z; and added the measurements to the above results table by hand. Generating the random numbers takes 1/3 or less of the total time of the Random Read benchmark, so the memory access dominates the benchmark as expected. It would be interesting to repeat this benchmark with arrays of 4 and more dimensions. Probably it would make the speed difference bigger, because the multidimensional array's topmost levels will fit into the CPU's cache, and only the other levels will require a memory lookup.

Esko Luontola