views:

176

answers:

3

I'm working on some matlab code which is processing large (but not huge) datasets: 10,000 784 element vectors (not sparse), and calculating information about that which is stored in a 10,000x10 sparse matrix. In order to get the code working I did some of the trickier parts iteratively, doing loops over the 10k items to process them, and a few a loop over the 10 items in the sparse matrix for cleanup.

My process initially took 73 iterations (so, on the order of 730k loops) to process, and ran in about 120 seconds. Not bad, but this is matlab, so I set out to vectorize it to speed it up.

In the end I have a fully vectorized solution which gets the same answer (so it's correct, or at least as correct as my initial solution), but takes 274 seconds to run, it's almost half as fast!

This is the first time I've ran into matlab code which runs slower vectorized than it does iteratively. Are there any rules of thumb or best practices for identifying when this is likely / possible?

I'd love to share the code for some feedback, but it's for a currently open school assignment so I really can't right now. If it ends up being one of those "Wow, that's weird, you probably did something wrong things" I'll probably revisit this in a week or two to see if my vectorization is somehow off.

+4  A: 

Have you tried plotting the execution time as a function of problem size (either the number of elements per vector [currently 784], or the number of vectors [currently 10,000])? I ran into a similar anomaly when vectorizing a Gram-Schmidt orthogonalization algorithm; it turned out that the vectorized version was faster until the problem grew to a certain size, at which point the iterative version actually ran faster, as seen in this plot: Execution time comparison between vectorized and unvectorized implementations of the Gram-Schmidt orthogonalization algorithm

Here are the two implementations and the benchmarking script:

clgs.m

function [Q,R] = clgs(A)
% QR factorization by unvectorized classical Gram-Schmidt orthogonalization

[m,n] = size(A);

R = zeros(n,n);     % pre-allocate upper-triangular matrix

% iterate over columns
for j = 1:n
    v = A(:,j);

    % iterate over remaining columns
    for i = 1:j-1
        R(i,j) = A(:,i)' * A(:,j);
        v = v - R(i,j) * A(:,i);
    end

    R(j,j) = norm(v);
    A(:,j) = v / norm(v);   % normalize
end
Q = A;

clgs2.m

function [Q,R] = clgs2(A)
% QR factorization by classical Gram-Schmidt orthogonalization with a
% vectorized inner loop

[m,n] = size(A);
R = zeros(n,n);     % pre-allocate upper-triangular matrix

for k=1:n
    R(1:k-1,k) = A(:,1:k-1)' * A(:,k);
    A(:,k) = A(:,k) - A(:,1:k-1) * R(1:k-1,k);
    R(k,k) = norm(A(:,k));
    A(:,k) = A(:,k) / R(k,k);
end

Q = A;

benchgs.m

n = [300,350,400,450,500];

clgs_time=zeros(length(n),1);
clgs2_time=clgs_time;

for i = 1:length(n)
    A = rand(n(i));
    tic;
    [Q,R] = clgs(A);
    clgs_time(i) = toc;

    tic;
    [Q,R] = clgs2(A);
    clgs2_time(i) = toc;
end

semilogy(n,clgs_time,'b',n,clgs2_time,'r')
xlabel 'n', ylabel 'Time [seconds]'
legend('unvectorized CGS','vectorized CGS')
las3rjock
+6  A: 

Vectorisation in Matlab often means allocating a lot more memory (making a much larger array to avoid the loop eg by tony's trick). With improved JIT compiling of loops in recent versions - its possible that the memory allocation required for your vectorised solution means there is no advantage, but without seeing the code it's hard to say. Matlab has an excellent line-by-line profiler which should help you see which particular parts of the vectorised version are taking the time.

thrope
Yup, the large repmats are where all of the time is being spent in the vectorized version, I suspect @las3rjock has the right idea that my vectorized solution would probably be faster for some versions, but is slower at this size, I think I may do the plot he suggested just to check.
Donnie
Even with only 1000 vectors, the iterative version is still faster (1.8 seconds vs 4.1 seconds). I think I will revisit this later when I can share the code to see if I'm doing something dumb, as this is the first time I've ran into a difference like this.
Donnie
sometimes you can rewrite the code to avoid slow **repmat** using **bsxfun** and the like..
Amro
@Amro: Do you think `bsxfun` works with an in-built loop? I've always wondered about the memory usage comparison between `repmat` and `bsxfun`. I had tried the following, `A = rand(3,5e7);A = bsxfun(@minus,A,[1 2 3]');` And generally an in-place operation should prevent more memory from being allocated (like in repmat), but I get an `Out of memory` error!Looks like the only (and incredibly slow) way I can do it would be: `for i = 1:size(A,2),A(:,i) = A(:,i)-[1 2 3]';end`
Jacob
@Jacob - I suspect bsxfun doesnt work in place - it still has to allocate a new array for the output. One of my biggest dislikes of matlab is that all this stuff is a bit of a mystery...
thrope
@Jacob: in terms of memory usage, as @thrope siad, `bsxfun` has to allocate mem for the output (just as any function in MATLAB that returns a result). So unless you are allocating data of size approaching the maximum capacity (see `memory` function), you are still better off using `bsxfun` for speed. BTW if you shrink `A` by half, i guess the above will work (on my machine, I cant even allocate `A` in the first place!). However in terms of CPU time, `bsxfun` is way faster.. Take a look at this comparison: http://blogs.mathworks.com/loren/2008/08/04/comparing-repmat-and-bsxfun-performance/
Amro
I think the answer is that you've vectorised it badly. _repmat_ is *really* slow and I always try to avoid it, especially for expanding scalar dimensions. As stated in the answer, the profiler is your friend.
Nzbuu
A: 

To answer the question "When not to vectorize MATLAB code" more generally:

Don't vectorize code if the vectorization is not straight forward and makes the code very hard to read. This is under the assumption that

  1. Other people than you might need to read and understand it.
  2. The unvectorized code is fast enough for what you need.
kigurai