Essentially this depends on the Chip architecture that you're dealing with. For most CPUs, it isn't possible to zero out whole swathes of memory at go, and therefore each word will require a separate memory operation, no matter what facilities your programming language provides.
It helps tremendously if your memory is contiguous for memory access time, because memory adjacent to memory just accessed will be cached, and subsequent accesses will hit the cache, resulting in fast performance.
The result of this is that if your matrix is large, it may be faster to zero out a row at a time or a column at a time, rather than vice versa, depending on whether your data is written by column or by row.
EDIT: I have assumed that your matrices aren't sparse, or triangular, or otherwise special, since you talk about "zeroing out a whole row". If you know that your matrix is mostly empty or somehow otherwise fits a special pattern, you would be able to represent your matrix in a different way (not a simple nxm array) and the story would be different. But if you have an nxm matrix right now, then this is the case.