views:

248

answers:

5

Code I:

for(i=0; i<100; i++){
  for(j=0; j<1000; j++){
    x = y;
  }
}

Code II:

for(i=0; i<1000; i++){
  for(j=0; j<100; j++){
    x = y;
  }
}

Can you explain why one of these loop configurations will take more time to run than the other?

+3  A: 

This really depends on multiple factors all out of your direct control.

As user David V says in the comments, both will be just eliminated by a good compiler. Then if they are not they will translate into some machine code with branching instructions. When a processor runs code with branching it uses so-called speculative branch prediction that behaves differently depending on what exact instructions the code is translated into. Other factors can kick in - like for example cache misses of code. You can't really tell until you measure carefully and thoroughly analyze the results.

sharptooth
A: 

A good answer is probably: They are both ineffective ways of finding something in a 2 dimensional array, and you should try to some kind of indexing to remove it.

This was an interview question, right? Well, expect an interview answer :)

Lars Mæhlum
A: 

In general the inner loop has major chances to fit entirely in cache, so the 100 out-1000 in should be faster. But compilers are so smart...

Giuseppe Guerrini
There doesn't seem to be involved any code at all here that would use more than one cache line, so it shouldn't make a difference at all.
Lasse V. Karlsen
Yes, int this particular case. My consideration was about a generic case with two nested loops. Regards
Giuseppe Guerrini
A: 

I may point out that any good compiler, but not as good as mentioned by David above, will compile this to various CPU instructions, and will have no need for branching, or any of that branch prediction logic that helps to avoid pipeline stalls.

In fact, there is a trivial CPU level construct (the loop instruction) that will do the above using minimal software emulation. As such, multiplication is commutative, so 100x1000 or 1000x100 is the same.

Hiato
+1  A: 

While all the answers are generally correct, in my opinion. Namely, it would be optimized out and it would depend on the machine code, etc. I think in the most simplistic case, assuming no optimization and no speculative branching (which may not be realistic), Code 1 would prove to be faster because there is some amount of overhead in setting up the loops. Namely, you have to declare the i and J variables. Since the outer loop's overhead always only happens once, the inner loop is the real factor here. Since in Code 1, the inner loop is only set up 100 times and in the Code 2, the inner loop is set up 1000 times, Code 1 should be faster. Again, this is in the most simple case, which is probably what the interview question or quiz question was fishing for.

Mutmansky