tags:

views:

341

answers:

4

I have a question about constructing bitmasks in C. I need to mask out the least-significant half of a 'long int', so that I am left with only the upper half. I need to ensure that it masks out half no matter if I am on a 64-bit or 32-bit platform. I see that __WORD_SIZE is defined in limits.h. Initially I am doing it like this:

#define UPPER(X) ( X & ( ~0 << (__WORDSIZE/2) ) )

what is the most correct and efficient way to do it?

+1  A: 

What you have is good. Constant propagation will collapse the ( ~0 << (__WORDSIZE/2) ) into a single value, so long as __WORDSIZE is constant, which it is.

David Seiler
cool. thanks. That is exactly what I was worried about, that the ops to reduce it had to be performed each time.
mushroom_picker
+12  A: 

I would suggest you use something like

#define UPPER(x) (x & (~0 << (sizeof(x) * 4)))

This will work even if limits.h is not present or if for some reason __WORDSIZE is not defined. Moreover, it will also work for other types, so you could e.g. use it on an int, a short, a char, etc.
Any decent compiler will calculate the value of

sizeof(x) * 4

at compile time (since they are both constants), which means you do not have to worry about any performance hit there.

EDIT: corrected error - sizeof returns size in bytes not bits, so we have to multiply by 4 (8 / 2) to get the correct result. Thanks to those who pointed that out.

EDIT 2: If you want to be really pedantic, you could use

#define UPPER(x) (x & (~0 << (sizeof(x) * CHAR_BITS / 2)))

CHAR_BIT is a constant defined in limits.h - it specifies the number of bits in a character, and is platform specific. However, this isn't really necessary (in general), since AFAIK there are no platforms in general use ATM that use bytes of a non-standard size.

a_m0d
This is wrong. sizeof() doesn't return you the size in bits. Please correct.
Igor Krivokon
Justicle
Done - sorry 'bout that
a_m0d
Please thank with comment upvotes :-)
Justicle
Technically, you should use '#include <limits.h>' and CHAR_BITS, just in case the code is run on a 36-bit machine.
Jonathan Leffler
Sorry about being picky :) but it's still not 100% correct. The sizeof returns the size in chars, not in bytes. On most systems it's the same. But not always. The right thing to do is to use sizeof(char)/2 instead of 4.
Igor Krivokon
Note that __WORDSIZE is not defined on many platforms -- it is not available on Solaris, amongst others. It isn't even defined under Cygwin.
Jonathan Leffler
@Igor Krivokon: no, sizeof(char) == 1 on all machines. You need to multiply by CHAR_BITS too. (You're right that not all platforms have 8 bits per byte, if that is what you are suggesting.)
Jonathan Leffler
Technically correct - the best kind of correct. :-) Good call. Out of curiosity has anything since the PDP-10 had non-8 bit bytes?
Justicle
@Igor Krivokon: No, sizeof() returns the size in bytes - see http://publications.gbdirect.co.uk/c_book/chapter5/sizeof_and_malloc.html for more info
a_m0d
Anyone know how to determine the number of bits in a char without using limits.h? I know it has to be defined there, but just out of curiosity
a_m0d
@a_m0d: You *might* be able to do some ugly hack with bitfields, e.g. find the N such that sizeof(struct{int x:N}) != sizeof(struct{int x:N+1}), but that would be an EXTREMELY ugly hack. Just use CHAR_BIT form <limits.h>.
Adam Rosenfield
@a_m0d: there is no simple, reliable mechanism other than <limits.h> for determining CHAR_BITS - at least, not at compile time. Clearly, you could poke around with loops etc in the executable. I suppose some variation on '#if ((char)(1 << 8)) == 0' might work; I'm not sure that it counts as simple, though.
Jonathan Leffler
I believe there are still some Burroughs mainframes out there with odd-ball word sizes (48-bit? 60-bit - see http://en.wikipedia.org/wiki/Burroughs_large_systems).
Jonathan Leffler
a_m0d
@a_m0d: Sweet. This is what I was looking for. Took me a second with the '*4'. But I see, as in '(sizeof(X) * 8)/2'.
mushroom_picker
@Jonathan: And in all cases we are still assuming the word size is an even number of bits
mushroom_picker
@mushroom_picker: If this answers your question, please accept an answer so that others can also use the solution
a_m0d
@Jonathan - of course, you're right about sizeof(char); I made exactly the same mistake that I asked to correct in a first place ;) that's ironic
Igor Krivokon
+3  A: 
#define UPPER(X) ( (X) & ( ~0L << ( ( sizeof(long) * CHAR_BIT ) / 2 ) ) )
Dingo
+1  A: 

I try not to be clever. I would do something like this:

static inline int UPPER(long int x) {
if (sizeof(long int) == 8)
  return x & 0xffffffff00000000;
else if (sizeof(long int) == 4)
  return x & 0xffff0000;
}

Let the compiler and optimizer do the work, and the code is clear for any future maintainer. If supporting a 36-bit processor in the future is a concern, add an else clause that triggers some error condition, so you can deal with that when it comes up.

Neil
long int is allowed to be *any* number of bytes larger than or equal to 4, so to be truly general you would have an infinite number of cases, even ignoring such things as machines with non-8-bit bytes.
Tyler McHenry
Sure, but is that the original question? It is unlikely that the size of an unsigned long will not be a multiple of 4 on typical processors today or in the near future. It is also unlikely that a processor would not have 8-bit bytes, except in some very specific embedded applications. So, I would make the code as clear and simple as I could. If I'm working on a processor with unusual-sized bytes, I expect there would be a considerable effort porting the rest of the code, and this is just one more thing to port.
Neil