Locality of reference. Because the data is stored by rows, for each row the j columns are in adjacent memory addresses. The OS will typically load an entire page from memory into the cache and adjacent address references will likely refer to that same page. If you increment by the row index in the inner loop it is possible that these rows will be on different pages (since they are separated by j doubles each) and the cache may have to constantly bring in and throw away pages of memory as it references the data. This is called thrashing and is bad for performance.
In practice and with larger, modern caches, the sizes of the rows/columns would need to be reasonably large before this would come into play, but it's still good practice.
[EDIT] The answer above is specific to C and may differ for other languages. The only one that I know is different is FORTRAN. FORTRAN stores things in column major order (the above is row major) and it would be correct to change the order of the statements in FORTRAN. If you want/need efficiency, it's important to know how your language implements data storage.