tags:

views:

199

answers:

5

1) On a 32-bit CPU is it faster to acccess an array of 32 boolean values or to access the 32 bits within one word? (Assume we want to check the value of the Nth element and can use either a bit-mask (Nth bit is set) or the integer N as an array index.)

It seems to me that the array would be faster because all common computer architectures natively work at the word level (32 bits, 64 bits, etc., processed in parallel) and accessing the sub-word bits takes extra work.

I know different compilers will represent things differently, but it seems that the underlying hardware architecture would dictate the answer. Or does the answer depend on the language and compiler?

And, 2) Is the speed answer reversed if this array represents a state that I pass between client and server? This question came to mind when reading question "How use bit/bit-operator to control object state?"

P.S. Yes, I could write code to test this myself, but then the SO community wouldn't get to play along!

A: 

If you are going to check more than one value at a time, doing it in parallel will obviously be faster. If you're only checking one value, it's probably the same.

If you need a better answer than that, write some tests and get back to us.

Mark Ransom
+3  A: 

For question #1: Yes, on most 32-bit platforms, an array of boolean values should be faster, because you will just be loading each 32-bit-aligned value in the array and testing it against 0. If you use a single word, you will have all that work plus the overhead of bit-fiddling.

For question #2: Again, yes, since sending data over a network is significantly slower than operating on data in the CPU and main memory, the overhead of sending even one word will strongly outweigh any performance gain or loss you get by aligning words or bit fiddling.

MattK
+4  A: 

Bear in mind that a theoretically faster solution that doesn't fit into a cache line might be slower than a theoretically slower one that does, depending on a whole host of things. If this is actually something that needs to be fast, as determined by profiling, test both ways and see. If it doesn't, do whatever looks like cleaner code, which is probably the array.

David Thornley
Yup. Processor cache swapping is a big deal here. Also, timeslicing can mess up your "tests" pretty bad. Have to be careful!
windfinder
+1  A: 

This is the code generated by 0 != (value & (1 << index)) to test a bit:

00401000  mov         eax,1 
00401005  shl         eax,cl 
00401007  and         eax,1

And this by values[index] to test a bool[]:

00401000  movzx       eax,byte ptr [ecx+eax]

Can't figure out how to put a loop around it that doesn't get optimized away, I'll vote bool[].

Hans Passant
+3  A: 

It depends on the compiler and the access patterns and the platform. Raymond Chen has an excellent cost-benefit analysis: http://blogs.msdn.com/oldnewthing/archive/2008/11/26/9143050.aspx .

Even on non x86 platforms the use of bits can be prohibitive as at least one PPC platform out there uses microcoded instructions to perform a variable shift which can do nasty things with other hardware threads.

So it can be a win, but you need to understand the context in which it will be good and bad. (Which is a general thing anyway.)

MSN
+1 I was about to search for this blog entry, but you did it for me.
Tmdean