ansaurus

Question

Bit mask in C

Answer 1

+21 A:

~(~0 << m) << n

Darius Bacon 2008-11-25 06:21:44

Both these work, and they're both good, so I'm upvoting two for the first time ever.

paxdiablo 2008-11-25 06:28:31

You beat me by 3 seconds according to SO. :D

Jonathan Leffler 2008-11-25 06:28:31

Thanks for the answers guys.

grigy 2008-11-25 06:34:31

Jonothan, yeah, I was surprised!

Darius Bacon 2008-11-25 07:13:47

This is slick. However, It would be a good idea to comment this line, for the -next- programmer to work on it.

mkClark 2008-11-25 23:41:23

If this was coded as a function (the set_mask_n functions in @quinmar's answer), there'd be a one-line comment saying what the function does (and no argument 'k'), and users of the function would have the name as documentation. As a random expression in a bit of code, it would indubitably be BAD!

Jonathan Leffler 2008-11-27 05:55:55

And, I would hasten (very slowly) to add, my solution would be equally inscrutable if it appeared as an uncommented expression in a bit of code.

Jonathan Leffler 2008-12-10 01:56:53

Answer 2

+15 A:

So, you are asking for m set bits prefixed by k reset bits and followed by n reset bits? We can ignore k since it will largely be constrained by the choice of integer type.

mask = ((1 << m) - 1) << n;

Jonathan Leffler 2008-11-25 06:21:47

Both these work, and they're both good, so I'm upvoting two for the first time ever.

paxdiablo 2008-11-25 06:29:01

They both work but I find Jonathan's answer to be simpler and clearer. Darius' answer is a little too backwards to me.

Robert Gamble 2008-11-25 06:34:25

Thanks for the answers guys.

grigy 2008-11-25 06:35:06

Robert, I like the ~0 idiom for bitmasks because it doesn't depend on 2's-complement, and is in that sense simpler, but it's true it's less well-known. Just doing my bit to change that!

Darius Bacon 2008-11-25 07:10:42

@Darius: if you are using unsigned arithmetic/types, as you should in these contexts, isn't the difference between 2's-complement, 1's-complement and sign-magnitude arithmetic immaterial?

Jonathan Leffler 2008-11-25 07:18:45

@Darius, you shouldn't be performing bitwise arithmetic on signed types in the first place and if you were, your solution invokes undefined behavior every time!

Robert Gamble 2008-11-25 07:20:06

Is it undefined? I dont have spec at hand, but I think it is implementation defined, i.e. the compiler is allowed to do it like he wants, but he must always do it the same way. So when you know the treatment (of your compiler) you can rely on it.

flolo 2008-11-25 07:52:55

C99 §6.5.7p4: "The result of E1 << E2 is E1 left-shifted E2 bit positions; [...] If E1 has a signed type and nonnegative value, and E1 * 2 ^ E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined."

Robert Gamble 2008-11-25 07:55:42

I was wrong to say 2's-complement. Properly, what's simpler is that the ~0 code uses only shifts and individual-bit operations, not arithmetic depending on details of number representation. This is natural for what bitmasks are for. It's admittedly not worth confusing anyone over.

Darius Bacon 2008-11-25 08:14:31

(continued) So (1<<m)-1 seems more surprising, if you come to it with fresh eyes, which we mostly don't.

Darius Bacon 2008-11-25 08:18:56

@Darius: maybe your best shot is: if the integers are 32-bits and m = 32, then the behaviour of my expression is undefined (6.5.7, para 3 in C99). And I don't think I could refute that claim.

Jonathan Leffler 2008-11-25 08:26:51

@Darius: continuing - however, the same argument can be used against yours...oh well. This is as near a tie as anything ever can be. Congratulations on getting there first.

Jonathan Leffler 2008-11-25 08:28:18

I think yours might be a jot faster, Jonathan. (Haven't tested.) Honestly, I'm not a zealot in the grand ~0 cause -- it's funny how the littlest things start the longest threads.

Darius Bacon 2008-11-25 08:45:22

@Darius: counting basic operations: both solutions have two shifts, by the same amounts; yours has one active '~' operation versus mine has one '-' operation (the '~0' is evaluated by the compiler, of course). Both expressions should have the same speed.

Jonathan Leffler 2008-11-25 15:47:52

Yes, I had two speculative reasons in mind: ~0 is a more expensive constant than 1 on an architecture without a sign-extended constant-load operation, and a compiler that for some reason can take advantage of recognizing this idiom is more likely to look for the better-known one (yours).

Darius Bacon 2008-11-25 17:29:14

Answer 3

+2 A:

I like both solutions. Here is another way that comes to my mind (probably not better).

((~((unsigned int)0) << k) >> (k + n)) << n

EDIT: There was a bug in my previous version (it was without the unsigned int cast). The problem was that ~0 >> n adds 1s at the front and not 0s.

And yes this approach has one big downside; it assumes that you know the number of bits of the default integer type or in other words it assumes that you really know k, whereas the other solutions are independent of k. This makes my version less portable, or at least harder to port. (It also uses 3 shifts, and addition and a bitwise negation operator, which is two extra operations.)

So you would do better to use one of the other examples.

Here is a little test app, done by Jonathan Leffler, to compare and verify the output of the different solutions:

#include <stdio.h>
#include <limits.h>

enum { ULONG_BITS = (sizeof(unsigned long) * CHAR_BIT) };

static unsigned long set_mask_1(int k, int m, int n)
{
    return ~(~0 << m) << n;
}

static unsigned long set_mask_2(int k, int m, int n)
{
    return ((1 << m) - 1) << n;
}

static unsigned long set_mask_3(int k, int m, int n)
{
    return ((~((unsigned long)0) << k) >> (k + n)) << n;
}

static int test_cases[][2] =
{
    { 1, 0 },
    { 1, 1 },
    { 1, 2 },
    { 1, 3 },
    { 2, 1 },
    { 2, 2 },
    { 2, 3 },
    { 3, 4 },
    { 3, 5 },
};

int main(void)
{
    size_t i;
    for (i = 0; i < 9; i++)
    {
        int m = test_cases[i][0];
        int n = test_cases[i][1];
        int k = ULONG_BITS - (m + n);
        printf("%d/%d/%d = 0x%08lX = 0x%08lX = 0x%08lX\n", k, m, n,
               set_mask_1(k, m, n),
               set_mask_2(k, m, n),
               set_mask_3(k, m, n));
    }
    return 0;
}

quinmars 2008-11-25 09:19:34

On the assumption that this answer can be made to work, the obvious downside in comparison with the other two is the presence of the third shift operation, which makes it more time consuming.

Jonathan Leffler 2008-11-25 14:45:02

The other problem is that it uses the parameter k which the other two solutions are able to ignore (it doesn't use m, though, so it still only uses two of the three parameters).

Jonathan Leffler 2008-11-25 14:46:13

Right there was a bug in it, I fixed it now and add a comment that the other solutions are preferable. I haven't removed it completely, maybe someone can learn from my mistakes and it'd be sad to loose your nice test code :).

quinmars 2008-11-25 19:34:58

Instead of the cast, you should be able to use '0U' to indicate an unsigned zero, or '0UL' to indicate an unsigned long. I agree with leaving your answer in place - and with the edits you made.

Jonathan Leffler 2008-11-25 21:06:00

ansaurus

tags:

views:

answers:

Bit mask in C

related questions