Hi there.
I'm putting together some scientific code in Fortran 77, and I am having a debate on what would be faster.
Basically, I have an MxN matrix, let's call it A. M is larger than N. Later on in the code, I need to multiply transpose(A) by a bunch of vectors.
My question is, would it be faster to take A, transpose it on my own and store that, or when I call BLAS, just give it the transpose flag?
Thanks! -Patrick