views:

227

answers:

3

I have a 32 Bit number and want to count know how many bits are 1.

I'm thinking of this pseudocode:

mov eax, [number]
while(eax != 0)
{
  div eax, 2
  if(edx == 1)
  {
   ecx++;
  } 
  shr eax, 1
}

Is there a more efficient way?

I'm using NASM on a x86 processor.

(I'm just beginning with assembler, so please do not tell me to use code from extern libraries, because I do not even know how to include them ;) )

(I just found http://stackoverflow.com/questions/109023/best-algorithm-to-count-the-number-of-set-bits-in-a-32-bit-integer which also contains my solution. There are other solutions posted, but unfortunatly I can't seem to figure out, how I would write them in assembler)

+3  A: 

The most efficient way (in terms of execution time, anyway) is to have a lookup table. Obviously you're not going to have a 4-billion entry table, but you could break the 32 bits down into 8-bit chunks and only need a 256-entry table, or further down into 4-bit chunks and only need 16 entries. Good luck!

Carl Norum
If upfront cost is a problem, you could build the lookup table as you go. You know only ONE entry will have a value of 0 1's, and that's 0x00. Therefore, if an entry in the lookup table is 0, you know you need to count that one, but once you've calc'd it once, you can store it there. This way, you don't have to count all 256 when you start.
glowcoder
@glowcoder, that's a good suggestion. This questions sounds like a homework problem, though, so I think it's a bit overkill. I'd say it's much less complicated to just pre-generate the table.
Carl Norum
You can do a 32 bit population count in 15 - 20 instructions (see e.g. Hacker's Delight by Warren). Breaking up the word into 8 bit chunks, doing 4 table lookups and then summing the 4 results is probably not going to be as efficient as this, and it doesn't lend itself to optimisation, e.g. SIMD, GPGPU, etc.
Paul R
The table access could be much slower than a clever computation inside the CPU.
starblue
+1  A: 

My x86 assembler is a bit rusty, but this comes to mind:

clc            ; clear carry
xor ecx, ecx   ; clear ecx

shl eax, 1     ; shift off one bit into carry
adc ecx, 0     ; add carry flag to ecx
; ... repeat the last two opcodes 31 more times

ecx contains your bit count.

spoulson
+1  A: 

In processors that have SSE4 support, you have the POPCNT instruction that does this for you.

The most naive algorithm is actually faster than what you thought up (DIV instructions are really slow).

mov eax, [number] xor ecx,ecx loop_start: test eax,1 jnz next inc ecx next: shr eax, 1 mov eax,ecx

Regarding your comment about previous SO answers, I'm going to take an example answer from there and walk you through how I'll convert it.

long count_bits(long n) {     
  unsigned int c; // c accumulates the total bits set in v
  for (c = 0; n; c++) 
    n &= n - 1; // clear the least significant bit set
  return c;
}

(I'm going to assume you know how to define a function and fun stuff like that). What is needed is a very simple loop, a counter variable (traditionally, ecx is both the index and a counter), and bit testing instructions.

    mov edx,n
    xor ecx,ecx
loop_start:
    test edx,edx
    jz end
    mov ebx,edx
    dec ebx
    and edx,ebx
    inc ecx
    jmp loop_start
end:
    mov eax,ecx
    ret

Implementing something like the Hamming Weight algorithm in assembly isn't complicated, but is just complicated enough that you'd rather not do it as an initial homework problem.

Daniel Goldberg