views:

986

answers:

2

Suppose, in Matlab, that I have a matrix, A, whose elements are either 0 or 1.

How do I get a vector of the index of the last non-zero element of each column in a faster, vectorized way?

I could do

[B, I] = max(cumsum(A));

and use I, but is there a faster way? (I'm assuming cumsum would cost a bit of time even suming 0's and 1's).

Edit: I guess that I vectorized even more than I need fast - Mr. Fooz' loop is great but each loop in matlab seems to cost me a lot in debugging time even if it is fast.

+7  A: 

Fast is what you should worry about, not necessarily full vectorization. Recent versions of Matlab are much smarter about handling loops efficiently. If there's a compact vectorized way of expressing something, it's usually faster, but loops should not (always) be feared like they used to be.

clc

A = rand(5000)>0.5;
A(1,find(sum(A,1)==0)) = 1; % make sure there is at least one match

% Slow because it is doing to much work
tic;[B,I1]=max(cumsum(A));toc

% Fast because FIND is fast and it runs the inner loop
tic;
I3=zeros(1,5000);
for i=1:5000
  I3(i) = find(A(:,i),1,'last');
end
toc;
assert(all(I1==I3));

% Even faster because the JIT in Matlab is smart enough now
tic;
I2=zeros(1,5000);
for i=1:5000
  I2(i) = 0;
  for j=5000:-1:1
    if A(j,i)
      I2(i) = j;
      break;
    end
  end
end
toc;
assert(all(I1==I2));

On R2008a, Windows, x64, the cumsum version takes 0.9 seconds. The loop and find version takes 0.02 seconds. The double loop version takes a mere 0.001 seconds.

EDIT: Which one is fastest depends on the actual data. The double-loop takes 0.05 seconds when you change the 0.5 to 0.999 (because it takes longer to hit the break; on average). cumsum and the loop&find implementation have more consistent speeds.

EDIT 2: gnovice's flipud solution is clever. Unfortunately, on my test machine it takes 0.1 seconds, so it's much faster than cumsum, but slower than the looped versions.

Mr Fooz
Wow, are qualities of your loop that make it faster or would any such loop be the fastest way to do any similar operation?
Joe Soul-bringer
Mr Fooz
Another interesting thing to note is that in many cases recent versions of Matlab are smart about expression parsing. E=A.*B.*C.*D; will be executed with no extra temporaries, element-by-element, similar to the way you would do it if manually writing the operation in C. With multicore support enabled, Matlab tries to find disjoint parts of the operation and farm them out to different CPU cores. I don't know if it's smart enough to divide up independent loop iterations amongst the calls. For the tests I did, I used a Core 2 Duo proc.
Mr Fooz
On a multicore machine, you may even be able to incorporate a PARFOR (parallel for loop) for your outer loop, since the inner loop operates independently on each column.
gnovice
I tried parfor, as suggested by gnovice. The first time it took nearly a second (probably because it had to load in parfor logic and/or create the worker thread pool). After a few runs, it drops down to 0.04 seconds. So at least on a simple loop like this one with only 5000 iterations on a mere dual-core processor, the serial version is much faster.
Mr Fooz
Hmmm... I guess the work being done in the loop is so minimal that it is dwarfed by the extra overhead incurred from parallelization.
gnovice
Agreed. Micro parallelization often doesn't work well on CPUs.
Mr Fooz
+4  A: 

As shown by Mr Fooz, for loops can be pretty fast now with newer versions of MATLAB. However, if you really want to have compact vectorized code, I would suggest trying this:

[B,I] = max(flipud(A));
I = size(A,1)-I+1;

This is faster than your CUMSUM-based answer, but still not quite as fast as Mr Fooz's looping options.

Two additional things to consider:

  • What results do you want to get for a column that has no ones in it at all? With the above option I gave you, I believe you will get an index of size(A,1) (i.e. the number of rows in A) in such a case. For your option, I believe you will get a 1 in such a case, while the nested-for-loops option from Mr Fooz will give you a 0.

  • The relative speed of these different options will likely vary based on the size of A and the number of non-zeroes you expect it to have.

gnovice
Mr Fooz
That's kinda the result I expected: faster than CUMSUM but still slower than looping... although it's all still dependent on the size and fill-fraction of A (which the OP didn't really define).
gnovice

related questions