views:

442

answers:

6

Hi all, I got one problem at my exam for subject Principal of Programming Language. I thought for long time but i still did not understand the problem

Problem: Below is a program C, that is executed in MSVC++ 6.0 environment on a PC with configuration ~ CPU Intel 1.8GHz, Ram 512MB

#define M 10000
#define N 5000
int a[M][N];

void main() {
    int i, j;
    time_t start, stop;

    // Part A
    start = time(0);
    for (i = 0; i < M; i++)
        for (j = 0; j < N; j++)
            a[i][j] = 0;
    stop = time(0);
    printf("%d\n", stop - start);

    // Part B
    start = time(0);
    for (j = 0; j < N; j++)
        for (i = 0; i < M; i++)
            a[i][j] = 0;
    stop = time(0);
    printf("%d\n", stop - start);
}

Explain why does part A only execute in 1s, but it took part B 8s to finish?

Thanks a lot for your help,

Minh

+10  A: 

Row-major order versus column-major order.

Recall first that all multi-dimensional arrays are represented in memory as a continguous block of memory. Thus the multidimensional array A(m,n) might be represented in memory as

a00 a01 a02 ... a0n a10 a11 a12 ... a1n a20 ... amn

In the first loop, you run through this block of memory sequentially. Thus, you run through the array traversing the elements in the following order

a00  a01  a02  ...  a0n  a10  a11  a12  ...  a1n  a20 ... amn

1    2    3         n    n+1  n+2  n+3 ...   2n   2n+1    mn

In the second loop, you skip around in memory and run through the array traversing the elements in the following order

a00  a10  a20  ...  am0  a01  a11  a21  ...  am1  a02  ...  amn

or, perhaps more clearly,

a00  a01  a02  ...  a10  a11  a12  ...  a20 ... amn
1    m+1  2m+1      2    m+2  2m+2      3       mn

All that skipping around really hurts you because you don't gain advantages from caching. When you run through the array sequentially, neighboring elements are loaded into the cache. When you skip around through the array, you don't get these benefits and instead keep getting cache misses harming performance.

Jason
Could you explain more? I was looking for these in book Principal of Programming Language but i did not still find it :(
Minh
+1. A quick wikipedia link for the question asker: http://en.wikipedia.org/wiki/Row-major_order
ChristopheD
@Minh: Look for the phrase "locality of reference", maybe?
khedron
+5  A: 

Because of hardware architectural optimizations. Part A is executing operations on sequential memory addresses, which allows the hardware to substantially accelerate how the calculations are handled. Part B is basically jumping around in memory all the time, which defeats a lot of hardware optimizations.

Key concept for this specific case is processor cache.

chaos
+20  A: 

This has to do with how the array's memory is laid out and how it gets loaded into the cache and accessed: in version A, when accessing a cell of the array, the neighbors get loaded with it into the cache, and the code then immediately accesses those neighbors. In version B, one cell is accessed (and its neighbors loaded into the cache), but the next access is far away, on the next row, and so the whole cache line was loaded but only one value used, and another cache line must be filled for each access. Hence the speed difference.

Carl Seleborg
+1 for clearer and more precise explanation than mine. :)
chaos
+1 I learn something new every day here.
John Pirie
MSDN had a nice article recently talking about this in relation to bitmaps. The article is <a href="http://msdn.microsoft.com/en-us/magazine/cc850829.aspx" target="_blank">http://msdn.microsoft.com/en-us/magazine/cc850829.aspx</a> here.
TheArtTrooper
+6  A: 

The array you are declaring is laid out line-wise in memory. Basically you have a large block of M×N integers and C does a little trickery to make you believe that it's rectangular. But in reality it's flat.

So when you iterate through it line-wise (with M as the outer loop variable) then you are really going linearly through memory. Something which the CPU cache handles very well.

However, when you iterate with N in the outer loop, then you are always making more or less random jumps in memory (at least for the hardware it looks like that). You are accessing the first cell, then move M integers further and do the same, etc. Since your pages in memory are usually around 4 KiB large, this causes another page to be accessed for each iteration of the inner loop. That way nearly any caching strategy fails and you see a major slowdown.

Joey
A: 

I forgot the basic characteristic of C array. Thanks everyone for your explain :)

Minh
Use comments for messages like this. And if you have your answer, I want to encourage you to mark an answer -- the answer that best helped you -- as accepted; click the green check mark. You'll get some rep. points. :-)
Frank V
Thanks Frank,I just start to use this cool website, still need to study from you guy to use this more useful. Thanks :)
Minh
@Minh: If it helps, I *don't* think this is a basic characteristic of a C array. It's actually a pretty subtle operating system point. If all memory were magically held in the CPU with no cost for lookups, either case would be fine.
khedron
+1  A: 

The trouble is here, how your array is layed in memory.

In the computer memory, arrays are normally allocated such as, that first all columns of the first row are comming, then of the second row and so on.

Your computer memory is best viewed as a long stripe of bytes -- it is a one dimensional array of memory -- not two dimensional, so multi-dimensional arrays have to be allocated in the described way.

Now comes a further problem: Modern CPUs have caches. They have multiple caches and they have so called "cache-lines" for the first-level cache. What does this mean. Access to memory is fast, but not fast enough. Modern CPUs are much faster. So they have their on-chip caches which speed things up. Also they don't access single memory locations any more, but they fill one complete cache-line in one fetch. This also is for performance. But this behaviour gives all operations advantages which process data linearly. When you access first all columns in a row, then the next row and so on -- you are working linearly in fact. When you first process all first columns of all rows, you "jump" in the memory. Thus you always force a new cache-line to be filled, just few bytes can be processed, then the cache-line is possibly invalidated by your next jump ....

Thus column-major-order is bad for modern processors, since it does not work linearly.

Juergen
Got your idea, Juergen. Thanks
Minh