+2  A: 

EDIT: NEW SOLUTION

It appears that you want to repeat the calculation for every element in a 4096-by-4096 array of UINT32 values. If this is what you are doing, I think the fastest way to do it in MATLAB is to use the fact that BITGET is designed to operate on matrices of values. The code would look like this:

numArray = ...your 4096-by-4096 matrix of uint32 values...
w = zeros(4096,4096,'uint32');
for iBit = 1:32,
  w = w+bitget(numArray,iBit);
end

If you want to make vectorized versions of some of the other algorithms, I believe BITAND is also designed to operate on matrices.


The old solution...

The easiest way I can think of is to use the DEC2BIN function, which gives you the binary representation (as a string) of a non-negative integer:

w = sum(dec2bin(num) == '1');  % Sums up the ones in the string

It's slow, but it's easy. =)

gnovice
The cast to double is not needed. You're technique works. Unfortunately, dec2bin() is dirt slow. I'm compiling a table of runtimes for all my approaches, and dec2bin is still running. (Well past the other techniques in terms of time).
nsanders
No wonder... I just now realized you're repeating the calculation 4096^2 times!!! I'll have to give it more thought to see if there are faster ways to handle so many calculations in native MATLAB.
gnovice
Very nice! I actually have a pair of loops each going from 1 to 4096. I vectorized the inner loop using your technique and overall runtime is at ~7.55 sec. I did have to pass in 'uint32' as my type to zeroes(4096,1,'uint32') for MATLAB to be happy. Trying now with outer loop vectorized too.
nsanders
It's crappy that I can only mark one "accepted" answer. You gave the vectorization idea; Scheiner gave the algorithm. Thank you very much gnovice.
nsanders
No problem... happy to help. =)
gnovice
+4  A: 

Unless this is a MATLAB implementation exercise, you might want to just take your fast C++ implementation and compile it as a mex function, once per target platform.

kwatford
Calling an external routine is pretty unappealing for my application. I'm still hoping to drop the MATLAB code run time to a few seconds.
nsanders
I'll take your word for it as it's your application. However, in my experience the only reason not to mex-ify MATLAB code is that for complex operations it's a bit of a hassle. But once you've got it coded up, mex files work just like normal MATLAB functions and they have platform-specific file extensions, so you can just provide them all in your package and MATLAB will figure it out automatically. You can even provide a fallback MATLAB implementation for platforms you don't have compile access to.
kwatford
+2  A: 

I'd be interested to see how fast this solution is:

function r = count_bits(n)

shifts = [-1, -2, -4, -8, -16];
masks = [1431655765, 858993459, 252645135, 16711935, 65535];

r = n;
for i=1:5
   r = bitand(bitshift(r, shifts(i)), masks(i)) + ...
      bitand(r, masks(i));
end

Going back, I see that this is the 'parallel' solution given on the bithacks page.

Justin Scheiner
I just posted performance using your pre-edit algorithm. This was with hex2dec pre-computed. I'm going to double check whether I did everything correctly and also try your cleaned up code.
nsanders
I think that this would be the fastest method by far for 64 bit integers. All the other methods are O(n) but this is O(logn). It would probably be significantly faster with the loop unrolled.
Justin Scheiner
I'm actually running a loop unrolled version right now. I am surprised by this methods poor performance in the looped version; I also thought it would be fastest.
nsanders
Vectorizing this code for my inner loop leads to 2.25 second performance. (Same outline as with gnovice's solution).
nsanders
A: 

Try splitting the job into smaller parts. My guess is that if you want to process all data at once, matlab is trying to do each operation on all integers before taking successive steps and the processor's cache is invalidated with each step.

for i=1:4096,
    «process bits(i,:)»
end
liori

related questions