views:

249

answers:

6

I've two for loops that basically look up in two different arrays (each having a size around 2-4k at peak) and set a value in a 3rd array based on these values. For some weird reason there is a factor two difference between the performance of this piece of code depending on in which order I put the two for loops.

This is the first setup. It executes in ~150 milliseconds on my PC:

public static int[] SchoolMultiplication(int[] a, int[] b, int numberBase)
{
    List<double> times = new List<double>();
    TimeTest timeTest = new TimeTest();

    int aLen = a.Length;
    int bLen = b.Length;

    int[,] resultMatrix = new int[a.Length + b.Length, aLen];
    int[] result = new int[a.Length + b.Length];

    timeTest.Start();

    for (int horizontalIndex = 0; horizontalIndex < b.Length; horizontalIndex++)
    {
     for (int verticalIndex = 0; verticalIndex < a.Length; verticalIndex++)

     {
      resultMatrix[a.Length + b.Length - 1 - verticalIndex - horizontalIndex, verticalIndex] = a[a.Length - verticalIndex - 1] * b[b.Length - horizontalIndex - 1];
     }
    }

Now if I change nothing but the order of the loops like this

for (int verticalIndex = 0; verticalIndex < a.Length; verticalIndex++)
{
    for (int horizontalIndex = 0; horizontalIndex < b.Length; horizontalIndex++)
 {
  resultMatrix[a.Length + b.Length - 1 - verticalIndex - horizontalIndex, verticalIndex] = a[a.Length - verticalIndex - 1] * b[b.Length - horizontalIndex - 1];
 }
}

The total running time of the method drops to about ~400 milliseconds. How does a simple exchange of loop order improve performance by almost 300%? I suppose it is some kind of caching or pointer performance thing?

+16  A: 
Gavin Miller
... it's like a matrix of chairs in a cinema... visiting each chair by traversing row by row is faster than column by column...
egon
Without Cache however, the order of traversing through Random-Access Memory (RAM) doesn't matter (assuming all the array is on the RAM) - "The word random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its physical location and whether or not it is related to the previous piece of data.[1]" http://en.wikipedia.org/wiki/Random-access_memory
Liran Orevi
+1  A: 

It is very likely related to the cache hits/misses. The difference lies in sequential vs scattered access that lies in size above the size of one cache line.

For plain c++ loops, it would also help to make the loops backwards to gain a bit of performance on the loop. Not sure how it fits for .NET.

jdehaan
Why does it help to make the loops backwards?
Liran Orevi
If you have a look at the assembly code the test is easier. When looping down to 0 the test is easy because you decrement and test the Z flag of the CPU. By comparing to another limit you have to add an extra CMP (for X86 CPUs as an example)
jdehaan
+4  A: 

Locality, locality, locality of data. From Wikipedia (which says it better than I would have):

Linear data structures: Locality often occurs because code contains loops that tend to reference arrays or other data structures by indices. Sequential locality, a special case of spatial locality, occurs when relevant data elements are arranged and accessed linearly. For example, the simple traversal of elements in a one-dimensional array, from the base address to the highest element would exploit the sequential locality of the array in memory.[2] The more general equidistant locality occurs when the linear traversal is over a longer area of adjacent data structures having identical structure and size, and in addition to this, not the whole structures are in access, but only the mutually corresponding same elements of the structures. This is the case when a matrix is represented as a sequential matrix of rows and the requirement is to access a single column of the matrix.

Ed Swangren
A: 

I remember reading about this in Code Complete. In most languages, arrays are set up with the last index set up sequentially, so you're accessing bytes directly in a row when iterating over last index, instead of skipping around when iterating over the first.

Kaleb Brasee
The last index is the one where the data would be sequentially ordered, not the first.
Mike Daniels
Ah yeah, you're right.
Kaleb Brasee
+1  A: 

Your intuition is right, it is a caching issue. @Mike Daniels post to question below is essentially describing the exact same issue. The second bit of code will get far more cache hits.

http://stackoverflow.com/questions/997212/fastest-way-to-loop-through-a-2d-array

But, shhhh we're not supposed to care about performance right? :)

BobbyShaftoe
This code is being written for a performance competition in C#, so it's absolutely crucial. Can't believe I didn't think of memory storage.
Qua
@Qua, yeah I was just being facetious. The current party line among many people seems to be that performance no longer matters. But that's just silly.
BobbyShaftoe
A: 

I would also think that the relative sizes of arrays a and b would make a difference.

If a.length is large and b.length is small, the second option should be faster. Conversely, if a.length is small and b.length is large, the first option would be faster. The issue is avoiding the setup/teardown cost of the inner loop.

BTW, why do you have

int aLen = a.Length;

But then also call a.Length directly? Seems like you should choose one or the other.

Slaggg
While profiling the code trying to figure out what was happening I played around with caching the array lengths, what you're seeing are scattered pieces of that attempt. There was no optimization gain, so I eventually got rid of it.
Qua
Why if a.length is large and b.length is small, the second option should be faster?
Liran Orevi