ansaurus

Question

Answer 1

+10 A:

Row-major order versus column-major order.

Recall first that all multi-dimensional arrays are represented in memory as a continguous block of memory. Thus the multidimensional array A(m,n) might be represented in memory as

a00 a01 a02 ... a0n a10 a11 a12 ... a1n a20 ... amn

In the first loop, you run through this block of memory sequentially. Thus, you run through the array traversing the elements in the following order

a00  a01  a02  ...  a0n  a10  a11  a12  ...  a1n  a20 ... amn

1    2    3         n    n+1  n+2  n+3 ...   2n   2n+1    mn

In the second loop, you skip around in memory and run through the array traversing the elements in the following order

a00  a10  a20  ...  am0  a01  a11  a21  ...  am1  a02  ...  amn

or, perhaps more clearly,

a00  a01  a02  ...  a10  a11  a12  ...  a20 ... amn
1    m+1  2m+1      2    m+2  2m+2      3       mn

All that skipping around really hurts you because you don't gain advantages from caching. When you run through the array sequentially, neighboring elements are loaded into the cache. When you skip around through the array, you don't get these benefits and instead keep getting cache misses harming performance.

Jason 2009-07-13 13:28:26

Could you explain more? I was looking for these in book Principal of Programming Language but i did not still find it :(

Minh 2009-07-13 13:29:41

+1. A quick wikipedia link for the question asker: http://en.wikipedia.org/wiki/Row-major_order

ChristopheD 2009-07-13 13:30:42

@Minh: Look for the phrase "locality of reference", maybe?

khedron 2009-07-13 15:05:43

Answer 2

+5 A:

Because of hardware architectural optimizations. Part A is executing operations on sequential memory addresses, which allows the hardware to substantially accelerate how the calculations are handled. Part B is basically jumping around in memory all the time, which defeats a lot of hardware optimizations.

Key concept for this specific case is processor cache.

chaos 2009-07-13 13:30:35

Answer 3

+20 A:

This has to do with how the array's memory is laid out and how it gets loaded into the cache and accessed: in version A, when accessing a cell of the array, the neighbors get loaded with it into the cache, and the code then immediately accesses those neighbors. In version B, one cell is accessed (and its neighbors loaded into the cache), but the next access is far away, on the next row, and so the whole cache line was loaded but only one value used, and another cache line must be filled for each access. Hence the speed difference.

Carl Seleborg 2009-07-13 13:32:07

+1 for clearer and more precise explanation than mine. :)

chaos 2009-07-13 13:52:39

+1 I learn something new every day here.

John Pirie 2009-07-13 13:55:48

MSDN had a nice article recently talking about this in relation to bitmaps. The article is <a href="http://msdn.microsoft.com/en-us/magazine/cc850829.aspx" target="_blank">http://msdn.microsoft.com/en-us/magazine/cc850829.aspx</a> here.

TheArtTrooper 2009-07-13 14:13:11

Answer 4

+6 A:

The array you are declaring is laid out line-wise in memory. Basically you have a large block of M×N integers and C does a little trickery to make you believe that it's rectangular. But in reality it's flat.

So when you iterate through it line-wise (with M as the outer loop variable) then you are really going linearly through memory. Something which the CPU cache handles very well.

However, when you iterate with N in the outer loop, then you are always making more or less random jumps in memory (at least for the hardware it looks like that). You are accessing the first cell, then move M integers further and do the same, etc. Since your pages in memory are usually around 4 KiB large, this causes another page to be accessed for each iteration of the inner loop. That way nearly any caching strategy fails and you see a major slowdown.

Joey 2009-07-13 13:32:49

Answer 5

A:

I forgot the basic characteristic of C array. Thanks everyone for your explain :)

Minh 2009-07-13 13:36:28

Use comments for messages like this. And if you have your answer, I want to encourage you to mark an answer -- the answer that best helped you -- as accepted; click the green check mark. You'll get some rep. points. :-)

Frank V 2009-07-13 14:28:31

Thanks Frank,I just start to use this cool website, still need to study from you guy to use this more useful. Thanks :)

Minh 2009-07-13 14:30:58

@Minh: If it helps, I *don't* think this is a basic characteristic of a C array. It's actually a pretty subtle operating system point. If all memory were magically held in the CPU with no cost for lookups, either case would be fine.

khedron 2009-07-13 15:09:01

Answer 6

+1 A:

The trouble is here, how your array is layed in memory.

In the computer memory, arrays are normally allocated such as, that first all columns of the first row are comming, then of the second row and so on.

Your computer memory is best viewed as a long stripe of bytes -- it is a one dimensional array of memory -- not two dimensional, so multi-dimensional arrays have to be allocated in the described way.

Now comes a further problem: Modern CPUs have caches. They have multiple caches and they have so called "cache-lines" for the first-level cache. What does this mean. Access to memory is fast, but not fast enough. Modern CPUs are much faster. So they have their on-chip caches which speed things up. Also they don't access single memory locations any more, but they fill one complete cache-line in one fetch. This also is for performance. But this behaviour gives all operations advantages which process data linearly. When you access first all columns in a row, then the next row and so on -- you are working linearly in fact. When you first process all first columns of all rows, you "jump" in the memory. Thus you always force a new cache-line to be filled, just few bytes can be processed, then the cache-line is possibly invalidated by your next jump ....

Thus column-major-order is bad for modern processors, since it does not work linearly.

Juergen 2009-07-13 13:38:44

Got your idea, Juergen. Thanks

Minh 2009-07-13 13:45:49

ansaurus

tags:

views:

answers:

Speed of C program execution

related questions