views:

6771

answers:

3
+5  Q: 

Bit mask in C

What is the best way to construct a bit mask in C with m set bits preceded by k unset bits, and followed by n unset bits:

00..0 11..1 00..0
  k     m     n

For example, k=1, m=4, n=3 would result in the bit mask:

01111000
+21  A: 

~(~0 << m) << n

Darius Bacon
Both these work, and they're both good, so I'm upvoting two for the first time ever.
paxdiablo
You beat me by 3 seconds according to SO. :D
Jonathan Leffler
Thanks for the answers guys.
grigy
Jonothan, yeah, I was surprised!
Darius Bacon
This is slick. However, It would be a good idea to comment this line, for the -next- programmer to work on it.
mkClark
If this was coded as a function (the set_mask_n functions in @quinmar's answer), there'd be a one-line comment saying what the function does (and no argument 'k'), and users of the function would have the name as documentation. As a random expression in a bit of code, it would indubitably be BAD!
Jonathan Leffler
And, I would hasten (very slowly) to add, my solution would be equally inscrutable if it appeared as an uncommented expression in a bit of code.
Jonathan Leffler
+15  A: 

So, you are asking for m set bits prefixed by k reset bits and followed by n reset bits? We can ignore k since it will largely be constrained by the choice of integer type.

mask = ((1 << m) - 1) << n;
Jonathan Leffler
Both these work, and they're both good, so I'm upvoting two for the first time ever.
paxdiablo
They both work but I find Jonathan's answer to be simpler and clearer. Darius' answer is a little too backwards to me.
Robert Gamble
Thanks for the answers guys.
grigy
Robert, I like the ~0 idiom for bitmasks because it doesn't depend on 2's-complement, and is in that sense simpler, but it's true it's less well-known. Just doing my bit to change that!
Darius Bacon
@Darius: if you are using unsigned arithmetic/types, as you should in these contexts, isn't the difference between 2's-complement, 1's-complement and sign-magnitude arithmetic immaterial?
Jonathan Leffler
@Darius, you shouldn't be performing bitwise arithmetic on signed types in the first place and if you were, your solution invokes undefined behavior every time!
Robert Gamble
Is it undefined? I dont have spec at hand, but I think it is implementation defined, i.e. the compiler is allowed to do it like he wants, but he must always do it the same way. So when you know the treatment (of your compiler) you can rely on it.
flolo
C99 §6.5.7p4: "The result of E1 << E2 is E1 left-shifted E2 bit positions; [...] If E1 has a signed type and nonnegative value, and E1 * 2 ^ E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined."
Robert Gamble
I was wrong to say 2's-complement. Properly, what's simpler is that the ~0 code uses only shifts and individual-bit operations, not arithmetic depending on details of number representation. This is natural for what bitmasks are for. It's admittedly not worth confusing anyone over.
Darius Bacon
(continued) So (1<<m)-1 seems more surprising, if you come to it with fresh eyes, which we mostly don't.
Darius Bacon
@Darius: maybe your best shot is: if the integers are 32-bits and m = 32, then the behaviour of my expression is undefined (6.5.7, para 3 in C99). And I don't think I could refute that claim.
Jonathan Leffler
@Darius: continuing - however, the same argument can be used against yours...oh well. This is as near a tie as anything ever can be. Congratulations on getting there first.
Jonathan Leffler
I think yours might be a jot faster, Jonathan. (Haven't tested.) Honestly, I'm not a zealot in the grand ~0 cause -- it's funny how the littlest things start the longest threads.
Darius Bacon
@Darius: counting basic operations: both solutions have two shifts, by the same amounts; yours has one active '~' operation versus mine has one '-' operation (the '~0' is evaluated by the compiler, of course). Both expressions should have the same speed.
Jonathan Leffler
Yes, I had two speculative reasons in mind: ~0 is a more expensive constant than 1 on an architecture without a sign-extended constant-load operation, and a compiler that for some reason can take advantage of recognizing this idiom is more likely to look for the better-known one (yours).
Darius Bacon
+2  A: 

I like both solutions. Here is another way that comes to my mind (probably not better).

((~((unsigned int)0) << k) >> (k + n)) << n

EDIT: There was a bug in my previous version (it was without the unsigned int cast). The problem was that ~0 >> n adds 1s at the front and not 0s.

And yes this approach has one big downside; it assumes that you know the number of bits of the default integer type or in other words it assumes that you really know k, whereas the other solutions are independent of k. This makes my version less portable, or at least harder to port. (It also uses 3 shifts, and addition and a bitwise negation operator, which is two extra operations.)

So you would do better to use one of the other examples.

Here is a little test app, done by Jonathan Leffler, to compare and verify the output of the different solutions:

#include <stdio.h>
#include <limits.h>

enum { ULONG_BITS = (sizeof(unsigned long) * CHAR_BIT) };

static unsigned long set_mask_1(int k, int m, int n)
{
    return ~(~0 << m) << n;
}

static unsigned long set_mask_2(int k, int m, int n)
{
    return ((1 << m) - 1) << n;
}

static unsigned long set_mask_3(int k, int m, int n)
{
    return ((~((unsigned long)0) << k) >> (k + n)) << n;
}

static int test_cases[][2] =
{
    { 1, 0 },
    { 1, 1 },
    { 1, 2 },
    { 1, 3 },
    { 2, 1 },
    { 2, 2 },
    { 2, 3 },
    { 3, 4 },
    { 3, 5 },
};

int main(void)
{
    size_t i;
    for (i = 0; i < 9; i++)
    {
        int m = test_cases[i][0];
        int n = test_cases[i][1];
        int k = ULONG_BITS - (m + n);
        printf("%d/%d/%d = 0x%08lX = 0x%08lX = 0x%08lX\n", k, m, n,
               set_mask_1(k, m, n),
               set_mask_2(k, m, n),
               set_mask_3(k, m, n));
    }
    return 0;
}
quinmars
On the assumption that this answer can be made to work, the obvious downside in comparison with the other two is the presence of the third shift operation, which makes it more time consuming.
Jonathan Leffler
The other problem is that it uses the parameter k which the other two solutions are able to ignore (it doesn't use m, though, so it still only uses two of the three parameters).
Jonathan Leffler
Right there was a bug in it, I fixed it now and add a comment that the other solutions are preferable. I haven't removed it completely, maybe someone can learn from my mistakes and it'd be sad to loose your nice test code :).
quinmars
Instead of the cast, you should be able to use '0U' to indicate an unsigned zero, or '0UL' to indicate an unsigned long. I agree with leaving your answer in place - and with the edits you made.
Jonathan Leffler