ansaurus

Question

Back to basics; for-loops, arrays/vectors/lists, and optimization

Answer 1

+5 A:

The best approach? Write the most readable code, work out its complexity, and work out if that's actually a problem.

If each of your loops is O(n), then you've still only got an O(n) operation.

Having said that, it does sound like a LINQ approach would be more readable... and quite possibly more efficient as well. Admittedly we haven't seen the code, but I suspect it's the kind of thing which is ideal for LINQ.

Jon Skeet 2008-12-02 23:33:15

"We should forget about small efficiencies, say about 97% of the time: **premature optimization is the root of all evil**." -Knuth, Donald

Matt Cruikshank 2008-12-02 23:55:43

Yes, but there's still that 3%.

TraumaPony 2008-12-03 00:28:52

That 3% only counts when you actually know you're operating within that margin. If you don't know if your code is actually a bottleneck, why bother optimizing it? "Measure twice, cut once" ;-)

Erik van Brakel 2008-12-03 00:43:51

O(n/2) < O(n)But like said before, you don't really worry about the constants until you benchmark your code...How can you tell a small efficiency from a large one? 1) experience 2) testing.

Ape-inago 2008-12-03 02:07:24

Technically, O(n/2) = O(n), but n/2 < n...

Mr Fooz 2008-12-03 02:15:09

Answer 2

+3 A:

Am I understanding you correctly in this?

You have these loops:

for (...){
  // Do A
}

for (...){
  // Do B
}

for (...){
  // Do C
}

And you converted it into

for (...){
  // Do A
  // Do B
}
for (...){
  // Do C
}

and you're wondering which is faster?

If not, some pseudocode would be nice, so we could see what you meant. :) Impossible to say. It could go either way. You're right, fetching data is expensive, but locality is also important. The first version may be better for data locality, but on the other hand, the second has bigger blocks with no branches, allowing more efficient instruction scheduling.

If the extra performance really matters (as Jon Skeet says, it probably doesn't, and you should pick whatever is most readable), you really need to measure both options, to see which is fastest.

My gut feeling says the second, with more work being done between jump instructions, would be more efficient, but it's just a hunch, and it can easily be wrong.

jalf 2008-12-02 23:53:24

Answer 3

+4 A:

For referemce,

the article is at What your computer does while you wait - Gustav Duarte

Also there's a guide to big-O notation.

It's impossible to answer the question without being able to see code/pseudocode. The only reliable answer is "use a profiler". Assuming what your loops are doing is a disservice to you and anyone who reads this question.

Robert Paulson 2008-12-03 00:23:45

Answer 4

+1 A:

... working on one piece of data but with two functions can sometimes make it so that code to act on that data doesn't fit in the processor's low level caches.

for(i=0, i<10, i++ ) {
  myObject object = array[i];
  myObject.functionreallybig1(); // pushes functionreallybig2 out of cache
  myObject.functionreallybig2(); // pushes functionreallybig1 out of cache
}

vs

for(i=0, i<10, i++ ) {
  myObject object = array[i];
  myObject.functionreallybig1(); // this stays in the cache next time through loop
}


for(i=0, i<10, i++ ) {
  myObject object = array[i];
  myObject.functionreallybig2(); // this stays in the cache next time through loop
}

But it was probably a mistake (usually this type of trick is commented)

When data is cycicly loaded and unloaded like this, it is called cache thrashing, btw.

This is a seperate issue from the data these functions are working on, as typically the processor caches that separately.

Ape-inago 2008-12-03 00:50:29

Answer 5

+1 A:

Aside from cache thrashing on large functions, there may be benefits on tiny functions as well. This applies on any auto-vectorizing compiler (not sure if Java JIT will do this yet, but you can count on it eventually).

Suppose this is your code:

// if this compiles down to a raw memory copy with a bitmask...
Date morningOf(Date d) { return Date(d.year, d.month, d.day, 0, 0, 0); }

Date timestamps[N];
Date mornings[N];

// ... then this can be parallelized using SSE or other SIMD instructions
for (int i = 0; i != N; ++i)
    mornings[i] = morningOf(timestamps[i]);

// ... and this will just run like normal
for (int i = 0; i != N; ++i)
    doOtherCrap(mornings[i]);

For large data sets, splitting the vectorizable code out into a separate loop can be a big win (provided caching doesn't become a problem). If it was all left as a single loop, no vectorization would occur.

This is something that Intel recommends in their C/C++ optimization manual, and it really can make a big difference.

Tom 2008-12-03 01:17:59

Answer 6

+1 A:

Agent Worm 2008-12-06 22:51:16

Answer 7

+2 A:

Well, you've got complications if the two vectors are of different sizes. As has already been pointed out, this doesn't increase the overall complexity of the issue, so I'd stick with the simplest code - which is probably 2 loops, rather than 1 loop with complicated test conditions re the two different lengths.

Actually, these length tests could easily make the two loops quicker than a single loop. You might also get better memory fetch performance with 2 loops - i.e. you are looking at contiguous memory - i.e. A[0],A[1],A[2]... B[0],B[1],B[2]..., rather than A[0],B[0],A[1],B[1],A[2],B[2]...

So in every way, I'd go with 2 separate loops ;-p

Marc Gravell 2008-12-06 22:57:37

Answer 8

A:

Thank you every one for the information. Thinking in terms of Big-O and how to optimize has never been my strong point. I believe I am going to put the code back to the way it was, I should have trusted the way it was written before instead of jumping on my novice instincts. Also, in the future I will put more reference so everyone can understand what the heck I'm talking about (clarity is also not a strong point of mine :-/).

Thank you again.

Agent Worm 2008-12-08 17:50:58

ansaurus

tags:

views:

answers:

Back to basics; for-loops, arrays/vectors/lists, and optimization

related questions