I'd say it's because the author of this code probably didn't really know what he was doing :-). The 16-bit versions of these instructions are longer, and not any faster. In fact, they will probably cause a partial register stall on the next instruction that uses ECX (i.e. the MOV).
Also note that the jump can be safely moved one instruction earlier (after the DEC), as the DEC already sets ZF when its output is zero. This can simplify the code a bit.
So this is how I would write this code snippet:
mov eax, [count]
xor ecx, ecx
dec eax
jz next
bsr ecx, eax
inc ecx
next:
mov [maskWidth], ecx
Also, the motivation for dropping to assembly here seems to be using the BSR instruction, which does not have any equivalent in the C language or library. You can avoid using assembly by using a compiler-specific intrinsic function for this purpose. While these are inherently nonportable, neither is inline assembly.
In GCC the equivalent function would look like this:
unsigned int find_maskwidth(unsigned int itemCount)
{
if(itemCount <= 1)
return 0;
else
return 32 - __builtin_clz(itemCount - 1);
}
Much more readable, isn't it?