ansaurus

Question

Answer 1

A:

Going other way might be cheaper.

for j in N:
  for i in M:
    ij=j*i+j

ralu 2010-01-22 06:45:10

or, `ij=0; for j in N: for i in M: ...; ij += 1`

Jason Orendorff 2010-01-22 13:40:14

Answer 2

+7 A:

In general, it offers no performance advantage to collapse the loop as described.

Compilers do sometimes collapse such loops, but typically in unexpected ways.

In particular languages, or on particular platforms, you can speed up loops in general by:

counting downwards
making the function called in the body 'inline', or having the code in the loop body rather than a separate function
configuring the compiler - typically via command-line options - to 'unroll' loops and to remove frame pointers and such

But in all cases you have to have profiled your code to see that such efforts are warranted.

Generally, in my experience, nested loops like this are dominated by:

containers; avoid boxing and bounds checking if possible and you know you're safe
the cost of invoking other methods in them; use 'inline' if thats available
pipeline stalls by bad locality of reference; rearrange your memory if possible
pipeline stalls by second conditions; fewer ifs and indirect references is better

But that might not be applicable advice on your problem domain and platform. Profile!

Will 2010-01-22 06:51:15

++ Right, right, right. I would only say in place of Profile, use stackshots (http://stackoverflow.com/questions/406760/whats-your-most-controversial-programming-opinion/1562802#1562802) because almost always this kind of optimizing is "barking up the wrong tree".

Mike Dunlavey 2010-01-22 17:11:10

Agree, programing is people, compiling is for computers

ralu 2010-01-23 02:39:32

there is several reasons to do that: matrix/array iterators, optimization (only inner loops vectorized), cuda programming, OpenMP programming (3.0 has that). Compilers are very good but sometimes they need help from a human.

aaa 2010-01-25 01:13:28

I think the OP should illustrate these cases then. Because I've never seen my cuda go short on registers for separate iterators for each dimension.

Will 2010-01-25 07:13:37

ansaurus

tags:

views:

answers:

efficient loop collapse

related questions