ansaurus

Question

Answer 1

+4 A:

Have you tried plotting the execution time as a function of problem size (either the number of elements per vector [currently 784], or the number of vectors [currently 10,000])? I ran into a similar anomaly when vectorizing a Gram-Schmidt orthogonalization algorithm; it turned out that the vectorized version was faster until the problem grew to a certain size, at which point the iterative version actually ran faster, as seen in this plot: Execution time comparison between vectorized and unvectorized implementations of the Gram-Schmidt orthogonalization algorithm

Here are the two implementations and the benchmarking script:

clgs.m

function [Q,R] = clgs(A)
% QR factorization by unvectorized classical Gram-Schmidt orthogonalization

[m,n] = size(A);

R = zeros(n,n);     % pre-allocate upper-triangular matrix

% iterate over columns
for j = 1:n
    v = A(:,j);

    % iterate over remaining columns
    for i = 1:j-1
        R(i,j) = A(:,i)' * A(:,j);
        v = v - R(i,j) * A(:,i);
    end

    R(j,j) = norm(v);
    A(:,j) = v / norm(v);   % normalize
end
Q = A;

clgs2.m

function [Q,R] = clgs2(A)
% QR factorization by classical Gram-Schmidt orthogonalization with a
% vectorized inner loop

[m,n] = size(A);
R = zeros(n,n);     % pre-allocate upper-triangular matrix

for k=1:n
    R(1:k-1,k) = A(:,1:k-1)' * A(:,k);
    A(:,k) = A(:,k) - A(:,1:k-1) * R(1:k-1,k);
    R(k,k) = norm(A(:,k));
    A(:,k) = A(:,k) / R(k,k);
end

Q = A;

benchgs.m

n = [300,350,400,450,500];

clgs_time=zeros(length(n),1);
clgs2_time=clgs_time;

for i = 1:length(n)
    A = rand(n(i));
    tic;
    [Q,R] = clgs(A);
    clgs_time(i) = toc;

    tic;
    [Q,R] = clgs2(A);
    clgs2_time(i) = toc;
end

semilogy(n,clgs_time,'b',n,clgs2_time,'r')
xlabel 'n', ylabel 'Time [seconds]'
legend('unvectorized CGS','vectorized CGS')

las3rjock 2009-11-22 01:35:12

Answer 2

+6 A:

Vectorisation in Matlab often means allocating a lot more memory (making a much larger array to avoid the loop eg by tony's trick). With improved JIT compiling of loops in recent versions - its possible that the memory allocation required for your vectorised solution means there is no advantage, but without seeing the code it's hard to say. Matlab has an excellent line-by-line profiler which should help you see which particular parts of the vectorised version are taking the time.

thrope 2009-11-22 01:51:22

Yup, the large repmats are where all of the time is being spent in the vectorized version, I suspect @las3rjock has the right idea that my vectorized solution would probably be faster for some versions, but is slower at this size, I think I may do the plot he suggested just to check.

Donnie 2009-11-22 01:59:49

Even with only 1000 vectors, the iterative version is still faster (1.8 seconds vs 4.1 seconds). I think I will revisit this later when I can share the code to see if I'm doing something dumb, as this is the first time I've ran into a difference like this.

Donnie 2009-11-22 02:06:53

sometimes you can rewrite the code to avoid slow **repmat** using **bsxfun** and the like..

Amro 2009-11-22 02:52:55

@Amro: Do you think `bsxfun` works with an in-built loop? I've always wondered about the memory usage comparison between `repmat` and `bsxfun`. I had tried the following, `A = rand(3,5e7);A = bsxfun(@minus,A,[1 2 3]');` And generally an in-place operation should prevent more memory from being allocated (like in repmat), but I get an `Out of memory` error!Looks like the only (and incredibly slow) way I can do it would be: `for i = 1:size(A,2),A(:,i) = A(:,i)-[1 2 3]';end`

Jacob 2009-11-22 17:14:32

@Jacob - I suspect bsxfun doesnt work in place - it still has to allocate a new array for the output. One of my biggest dislikes of matlab is that all this stuff is a bit of a mystery...

thrope 2009-11-22 17:53:00

@Jacob: in terms of memory usage, as @thrope siad, `bsxfun` has to allocate mem for the output (just as any function in MATLAB that returns a result). So unless you are allocating data of size approaching the maximum capacity (see `memory` function), you are still better off using `bsxfun` for speed. BTW if you shrink `A` by half, i guess the above will work (on my machine, I cant even allocate `A` in the first place!). However in terms of CPU time, `bsxfun` is way faster.. Take a look at this comparison: http://blogs.mathworks.com/loren/2008/08/04/comparing-repmat-and-bsxfun-performance/

Amro 2009-11-23 23:54:26

I think the answer is that you've vectorised it badly. _repmat_ is *really* slow and I always try to avoid it, especially for expanding scalar dimensions. As stated in the answer, the profiler is your friend.

Nzbuu 2009-11-25 18:43:59

Answer 3

A:

To answer the question "When not to vectorize MATLAB code" more generally:

Don't vectorize code if the vectorization is not straight forward and makes the code very hard to read. This is under the assumption that

Other people than you might need to read and understand it.
The unvectorized code is fast enough for what you need.

kigurai 2009-11-23 08:03:43

ansaurus

tags:

views:

answers:

When not to vectorize matlab?

related questions