ansaurus

Question

Turning a matlab binary matrix into a vector of the last nonzero index in a fast, vectorized fashion

Answer 1

+7 A:

Fast is what you should worry about, not necessarily full vectorization. Recent versions of Matlab are much smarter about handling loops efficiently. If there's a compact vectorized way of expressing something, it's usually faster, but loops should not (always) be feared like they used to be.

clc

A = rand(5000)>0.5;
A(1,find(sum(A,1)==0)) = 1; % make sure there is at least one match

% Slow because it is doing to much work
tic;[B,I1]=max(cumsum(A));toc

% Fast because FIND is fast and it runs the inner loop
tic;
I3=zeros(1,5000);
for i=1:5000
  I3(i) = find(A(:,i),1,'last');
end
toc;
assert(all(I1==I3));

% Even faster because the JIT in Matlab is smart enough now
tic;
I2=zeros(1,5000);
for i=1:5000
  I2(i) = 0;
  for j=5000:-1:1
    if A(j,i)
      I2(i) = j;
      break;
    end
  end
end
toc;
assert(all(I1==I2));

On R2008a, Windows, x64, the cumsum version takes 0.9 seconds. The loop and find version takes 0.02 seconds. The double loop version takes a mere 0.001 seconds.

EDIT: Which one is fastest depends on the actual data. The double-loop takes 0.05 seconds when you change the 0.5 to 0.999 (because it takes longer to hit the break; on average). cumsum and the loop&find implementation have more consistent speeds.

EDIT 2: gnovice's flipud solution is clever. Unfortunately, on my test machine it takes 0.1 seconds, so it's much faster than cumsum, but slower than the looped versions.

Mr Fooz 2009-05-06 21:52:53

Wow, are qualities of your loop that make it faster or would any such loop be the fastest way to do any similar operation?

Joe Soul-bringer 2009-05-07 04:45:10

Mr Fooz 2009-05-07 13:46:30

Another interesting thing to note is that in many cases recent versions of Matlab are smart about expression parsing. E=A.*B.*C.*D; will be executed with no extra temporaries, element-by-element, similar to the way you would do it if manually writing the operation in C. With multicore support enabled, Matlab tries to find disjoint parts of the operation and farm them out to different CPU cores. I don't know if it's smart enough to divide up independent loop iterations amongst the calls. For the tests I did, I used a Core 2 Duo proc.

Mr Fooz 2009-05-07 13:49:37

On a multicore machine, you may even be able to incorporate a PARFOR (parallel for loop) for your outer loop, since the inner loop operates independently on each column.

gnovice 2009-05-07 15:43:47

I tried parfor, as suggested by gnovice. The first time it took nearly a second (probably because it had to load in parfor logic and/or create the worker thread pool). After a few runs, it drops down to 0.04 seconds. So at least on a simple loop like this one with only 5000 iterations on a mere dual-core processor, the serial version is much faster.

Mr Fooz 2009-05-07 19:58:22

Hmmm... I guess the work being done in the loop is so minimal that it is dwarfed by the extra overhead incurred from parallelization.

gnovice 2009-05-08 14:47:20

Agreed. Micro parallelization often doesn't work well on CPUs.

Mr Fooz 2009-05-08 15:46:30

Answer 2

+4 A:

As shown by Mr Fooz, for loops can be pretty fast now with newer versions of MATLAB. However, if you really want to have compact vectorized code, I would suggest trying this:

[B,I] = max(flipud(A));
I = size(A,1)-I+1;

This is faster than your CUMSUM-based answer, but still not quite as fast as Mr Fooz's looping options.

Two additional things to consider:

What results do you want to get for a column that has no ones in it at all? With the above option I gave you, I believe you will get an index of size(A,1) (i.e. the number of rows in A) in such a case. For your option, I believe you will get a 1 in such a case, while the nested-for-loops option from Mr Fooz will give you a 0.
The relative speed of these different options will likely vary based on the size of A and the number of non-zeroes you expect it to have.

gnovice 2009-05-06 22:17:16

Mr Fooz 2009-05-06 23:24:21

That's kinda the result I expected: faster than CUMSUM but still slower than looping... although it's all still dependent on the size and fill-fraction of A (which the OP didn't really define).

gnovice 2009-05-07 00:03:47

ansaurus

tags:

views:

answers:

Turning a matlab binary matrix into a vector of the last nonzero index in a fast, vectorized fashion

related questions