ansaurus

Question

refactoring Java arrays and primitives (double[][]) to Collections and Generics (List<List<Double>>)

Answer 1

+3 A:

The general guideline is to prefer generified collections over arrays in Java, but it's only a guideline. My first thought would be to NOT change this working code. If you really want to make this change, then benchmark both approaches.

As you say, performance is critical, in which case the code that meets the needed performance is better than code that doesn't.

You might also run into auto-boxing issues when boxing/unboxing the doubles - a potentially more subtle problem.

The Java language guys have been very strict about keeping the JVM compatible across different versions so I don't see arrays going anywhere - and I wouldn't call them obsolete, just more primitive than the other options.

hbunny 2009-09-11 07:36:31

for preformance-critical code, stick to the code you have if it is fast enough.

Andreas Petersson 2009-09-11 08:15:52

Prefer collections over reference arrays. But there isn't equivalents of primitive arrays in the Java library.

Tom Hawtin - tackline 2009-09-11 09:18:40

Answer 2

+7 A:

I read an excellent book by Kent Beck on coding best-practices ( http://www.amazon.com/Implementation-Patterns/dp/B000XPRRVM ). There are also interesting performance figures. Specifically, there are comparison between arrays and various collections., and arrays are really much faster (maybe x3 compared to ArrayList).

Also, if you use Double instead of double, you need to stick to it, and use no double, as auto(un)boxing will kill your performance.

Considering your performance need, I would stick to array of primitive type.

Even more, I would calculate only once the upper bound for the condition in loops. This is typically done the line before the loop.

However, if you don't like that the upper bound variable, used only in the loop, is accessible outside the loop, you can take advantage of the initialization phase of the for loop like this:

    for (int i=0, max=list.size(); i<max; i++) {
      // do something
    }

I don't believe in obsolescence for arrays in java. For performance-critical loop, I can't see any language designer taking away the fastest option (especially if the difference is x3).

I understand your concern for maintainability, and for coherence with the rest of the application. But I believe that a critical loop is entitled to some special practices.

I would try to make the code the clearest possible without changing it:

by carefully questionning each variable name, ideally with a 10-min brainstorming session with my collegues
by writing coding comments (I'm against their use in general, as a code that is not clear should be made clear, not commented ; but a critical loop justifies it).
by using private methods as needed (as Andreas_D pointed out in his answer). If made private final, chances are very good (as they would be short) that they will get inlined when running, so there would be no performance impact at runtime.

KLE 2009-09-11 07:37:14

Neat. I assume this stops repeated calls to size()?

peter.murray.rust 2009-09-11 07:43:21

Yes, you are correct ;-) The part of the loop that is before the first ';' is the initialization, it is done only once before the loop starts.

KLE 2009-09-11 08:06:39

My project involves many complex DSP calcuations, some of which are run in native code. We're dealing with files from 4GB to 2TB and samples rates into the 1GHZ range. Primitive arrays are typically WAY better in terms of both memory AND speed. Primitive arrays are also better if you even need to touch native code (JNI).

basszero 2009-09-11 13:35:14

@basszero Thanks for you useful input on such extreme cases, that few of us can get our hands dirty with ;-)

KLE 2009-09-11 13:51:07

Answer 3

A:

When you know the exact dimensions of the list you should stick with arrays. Arrays are not inherently bad, and they're not going anywhere. If you are performing a lot of (non-sequential) read and write operations you should use arrays and not lists, because the access methods of lists introduce a large overhead.

elmuerte 2009-09-11 07:38:40

Answer 4

+2 A:

Well I think that arrays are the best way to store process data in algorithms. Since Java doesn't support operator overloading (one of the reasons why I think arrays won't be obsolete that soon) switching to collections would make the code quite hard to read:

double[][] matrix = new double[10][10];
double t = matrix[0][0];

List<List<Double>> matrix = new ArrayList<List<Double>>(10);
Collections.fill(matrix, new ArrayList<Double>(10));
double t = matrix.get(0).get(0); // autoboxing => performance

As far as I know Java prestores some wrapper Object for Number instances (e.g. the first 100 integers), so that you can access them faster but I think that won't help much with that many data.

Daff 2009-09-11 07:41:05

Agreed. The code as written is perfectly readable, and the representation of a matrix using arrays is much easier to understand than a list of lists with all of the ugly nested gets() that would come with it.

Jon 2009-09-11 08:37:51

Answer 5

+3 A:

I fully agree with KLE's answer. Because the code is performance-critical, I'd keep the array based datastructures as well. And I believe, that just introducing collections, wrappers for primitive types and generics will not improve maintainability and clarity.

In addition, if this algorithm is the heart of the application and has been in use for several years now, chance are fairly low, that it will need maintenance like bug fixing or improvements.

For clarity, maintainability and interfacing with the rest of the code I would like to refactor it.

Instead of changing datastructures I'd concentrate on renaming and maybe moving some part of the code to private methods. From looking at the code, I have no idea what's happening, and the problem, as I see it, are the more or less short and technical variable and field names.

Just an example: one 2-dimensional array is just named 'matrix'. But it's obviously clear, that this is a matrix, so naming it 'matrix' is pretty redundant. It would be more helpful to rename it so that it becomes clear, what this matrix is really used for, what kind of data is inside.

Another candidate is your second line. With two refactorings, I'd rename 'jj' to something more meaningful and move the expression to a private method with a 'speaking' name.

Andreas_D 2009-09-11 07:59:45

+1 for good answer, I also agree with you. Maybe also the private method could be made final, to improve the chances that it gets inlined by the compiler.

KLE 2009-09-11 08:13:51

Thanks. I have actually done this in the latest version - and it helps a lot. However I thought I would post the original as it shows the problem nicely.

peter.murray.rust 2009-09-11 08:15:19

Answer 6

A:

In addition to sticking with arrays, I think you can tighten up this code in some meaningful ways. For instance:

Indeed, don't compute the loop bounds every time, save them off
You repeatedly reference matrix[i]. Just save off a reference to this subarray rather than dereferencing the 2D array every time
That trick gets even more useful if you can loop over i in the outer loop instead of inner loop
It's getting extreme, but saving the value of j-1 in a local might even prove to be worth it rather than recomputing
Finally if you are really really concerned about performance, run the ProGuard optimizer over the resulting byte code to have it perform some compiler optimizations like unrolling or peephole optimizations

Sean Owen 2009-09-15 14:08:58

Answer 7

+1 A:

I thought compilers were meant to be smart at doing this sort of thing. Do we need to still do this?

You are probably right that the JIT takes care of it, but if this section is so performance critical, trying and benchmarking wouldn't hurt.

Confusion 2010-01-30 00:30:59

ansaurus

tags:

views:

answers:

refactoring Java arrays and primitives (double[][]) to Collections and Generics (List<List<Double>>)

related questions