views:

923

answers:

11

I want to calculate 2n-1 for a 64bit integer value. What I currently do is this

for(i=0; i<n; i++) r|=1<<i;

and I wonder if there is more elegant way to do it. The line is in an inner loop, so I need it to be fast.

I thought of

  r=(1ULL<<n)-1;

but it doesn't work for n=64, because << is only defined for values of n up to 63.


EDIT: Thanks for all your answers and comments. Here is a little table with the solutions that I tried and liked best. Second column is time in seconds of my (completely unscientific) benchmark.

    
r=N2MINUSONE_LUT[n];            3.9 lookup table = fastest, answer by aviraldg
r =n?~0ull>>(64 - n):0ull;      5.9 fastest without LUT, comment by Christoph
r=(1ULL<<n)-1;                  5.9 Obvious but WRONG!   
r =(n==64)?-1:(1ULL<<n)-1;      7.0 Short, clear and quite fast, answer by Gabe
r=((1ULL<<(n/2))<<((n+1)/2))-1; 8.2 Nice, w/o spec. case, answer by drawnonward
r=(1ULL<<n-1)+((1ULL<<n-1)-1);  9.2 Nice, w/o spec. case, answer by David Lively
r=pow(2, n)-1;               99.0 Just for comparison
for(i=0; i<n; i++) r|=1<<i;   123.7 My original solution = lame

I accepted

r =n?~0ull>>(64 - n):0ull;

as answer because it's in my opinion the most elegant solution. It was Christoph who came up with it at first, but unfortunately he only posted it in a comment. Jens Gustedt added a really nice rationale, so I accept his answer instead. Because I liked Aviral Dasgupta's lookup table solution it got 50 reputation points via a bounty.

+29  A: 

Use a lookup table. (Generated by your present code.) This is ideal, since the number of values is small, and you know the results already.

/* magic stuff -- do not touch! */
const static uint64_t N2MINUSONE_LUT[] = {
0x0,
0x1,
0x3,
0x7,
0xf,
0x1f,
0x3f,
0x7f,
0xff,
0x1ff,
0x3ff,
0x7ff,
0xfff,
0x1fff,
0x3fff,
0x7fff,
0xffff,
0x1ffff,
0x3ffff,
0x7ffff,
0xfffff,
0x1fffff,
0x3fffff,
0x7fffff,
0xffffff,
0x1ffffff,
0x3ffffff,
0x7ffffff,
0xfffffff,
0x1fffffff,
0x3fffffff,
0x7fffffff,
0xffffffff,
0x1ffffffff,
0x3ffffffff,
0x7ffffffff,
0xfffffffff,
0x1fffffffff,
0x3fffffffff,
0x7fffffffff,
0xffffffffff,
0x1ffffffffff,
0x3ffffffffff,
0x7ffffffffff,
0xfffffffffff,
0x1fffffffffff,
0x3fffffffffff,
0x7fffffffffff,
0xffffffffffff,
0x1ffffffffffff,
0x3ffffffffffff,
0x7ffffffffffff,
0xfffffffffffff,
0x1fffffffffffff,
0x3fffffffffffff,
0x7fffffffffffff,
0xffffffffffffff,
0x1ffffffffffffff,
0x3ffffffffffffff,
0x7ffffffffffffff,
0xfffffffffffffff,
0x1fffffffffffffff,
0x3fffffffffffffff,
0x7fffffffffffffff,
0xffffffffffffffff,
};
Aviral Dasgupta
This is actually quite useful, since the values are known ahead of time.
stw_dev
might look less mysterious if values were defined in hex
cobbal
Wow, thanks for the table. I did a quick test and it's almost twice as fast as the "special case n=64" solution (3.9s for my specific problem).
Ludwig Weinzierl
And make the table static const, since it's not going to change. :)
unwind
The `ull` suffix is unnecessary for hexadecimal integer literals.
James McNellis
When is the `ull` suffix necessary?
Gabe
67 lines of code instead of 1? If a LUT is necessary for speed (which is possible, but the effect of and on the cache is hard to measure), why is statically declaring it better than initialising with code?
Paul Hankin
@paul : It isn't. Nor have I said so. The OP's asked for speed and elegance. Those don't exactly come packaged with readability.
Aviral Dasgupta
might I suggest as a comment `/* lookup table: n -> 2^n-1 -- do not touch */` instead.
wds
@wds That comment was intended as a play on "Do not erase". Of course, you could modify it in actual code.
Aviral Dasgupta
+7  A: 
if (n > 64 || n < 0)
  return undefined...

if (n == 64)
  return 0xFFFFFFFFFFFFFFFFULL;

return (1ULL << n) - 1;
brickner
That sure is a `FULL` 64 bit integer :P (I usually change the case of the suffix from the hex number to eliminate confusion)
Billy ONeal
+24  A: 

How about a simple r = (n == 64) ? -1 : (1ULL<<n)-1;?

Gabe
I like that, it's concise, easy to understand and reasonably fast (7.0 s for my specific problem).
Ludwig Weinzierl
`n ? ~0ull >> (64 - n) : 0ull` is equivalent and *might* be faster because there's only a single 64bit op
Christoph
Pedantically, I prefer ~0 to -1, since -1 implies (to me, not the compiler) that you're working with signed integers.
Steve314
@Christoph you're a hacker. It's 5.9s vs 7.0s for the original.
Ludwig Weinzierl
+4  A: 

The only problem is that your expression isn't defined for n=64? Then special-case that one value.

(n == 64 ? 0ULL : (1ULL << n)) - 1ULL
Paul Hankin
+11  A: 

If you want to get the max value just before overflow with a given number of bits, try

r=(1ULL << n-1)+((1ULL<<n-1)-1);

By splitting the shift into two parts (in this case, two 63 bit shifts, since 2^64=2*2^63), subtracting 1 and then adding the two results together, you should be able to do the calculation without overflowing the 64 bit data type.

David Lively
David, you win a special price for the most elegant solution. It's a little bit slower than the "special case n=64" solution (9.2s for my specific problem).
Ludwig Weinzierl
@Ludwig, you might be able to speed it up by calculating 1ULL<<n-1 once in a temporary, if the compiler isn't doing it for you already.
Mark Ransom
+4  A: 

Shifting 1 << 64 in a 64 bit integer yields 0, so no need to compute anything for n > 63; shifting should be enough fast

r = n < 64 ? (1ULL << n) - 1 : 0;

But if you are trying this way to know the max value a N bit unsigned integer can have, you change 0 into the known value treating n == 64 as a special case (and you are not able to give a result for n > 64 on hardware with 64bit integer unless you use a multiprecision/bignumber library).

Another approach with bit tricks

~-(1ULL << (n-1) ) | (1ULL << (n-1))

check if it can be semplified... of course, n>0

EDIT

Tests I've done

__attribute__((regparm(0))) unsigned int calcn(int n)
{
  register unsigned int res;
  asm(
    "  cmpl $32, %%eax\n"
    "  jg   mmno\n"
    "  movl $1, %%ebx\n"      // ebx = 1
    "  subl $1, %%eax\n"      // eax = n - 1
    "  movb %%al, %%cl\n"     // because of only possible shll reg mode
    "  shll %%cl, %%ebx\n"    // ebx = ebx << eax
    "  movl %%ebx, %%eax\n"   // eax = ebx
    "  negl %%ebx\n"          // -ebx
    "  notl %%ebx\n"          // ~-ebx
    "  orl  %%ebx, %%eax\n"   // ~-ebx | ebx
    "  jmp  mmyes\n"
    "mmno:\n"
    "  xor %%eax, %%eax\n"
    "mmyes:\n"
    :
    "=eax" (res):
    "eax" (n):
    "ebx", "ecx", "cc"
    );
  return res;
}

#define BMASK(X) (~-(1ULL << ((X)-1) ) | (1ULL << ((X)-1)))
int main()
{
  int n = 32; //...
  printf("%08X\n", BMASK(n));
  printf("%08X %d %08X\n", calcn(n), n&31, BMASK(n&31));
  return 0;
}

Output with n = 32 is -1 and -1, while n = 52 yields "-1" and 0xFFFFF, casually 52&31 = 20 and of course n = 20 gives 0xFFFFF...

EDIT2 now the asm code produces 0 for n > 32 (since I am on a 32 bit machine), but at this point the a ? b : 0 solution with the BMASK is clearer and I doubt the asm solution is too much faster (if speed is a so big concern the table idea could be the faster).

ShinTakezou
Thanks for your solution Shin. The problem is that 1<<64 is not 0. If 1<<64 was 0 everything would be fine, because 0ULL-1ULL=18446744073709551615ULL in modular arithmetic. This is even standard behaviour and you and rely on this. The problem is that the result of 1<<64 is implementation dependant. On my machine, with my compiler 1ULL<<64 results in 1.
Ludwig Weinzierl
a `n<64 ? calcn(n) : 0` would solve anyway then? (with calcn(n) anything from asm code to ~-(A)|(A) or alike)?
ShinTakezou
@ShinTakezou: As promised I had a look at your code. I don't get why you do ~-(1ULL << (n-1) ) | (1ULL << (n-1)). I seems overly complicated while it doesn't solve the original problem (that I have to special case either 0 or 64). Am I missing something?
Ludwig Weinzierl
ShinTakezou
P.S. yes it's overly complicated as expression, it seems simpler `n>=64 ? 0 : (1ULL << n)-1`; but here a real subtraction must be performed, while "negating" could be a little bit more performant of subtracting, in asm, __but__ `neg r0; not r0; or r1, r1, r0` instead of `sub r1, r1, 1`, and there's a n-1 anyway, and the shift also (in common), so my solution has anyway more instructions and so it is yes, overly complicated and not efficient :( (but ~-A|A looks very fine to me:)
ShinTakezou
+3  A: 

Since you've asked for an elegant way to do it:

const uint64_t MAX_UINT64 = 0xffffffffffffffffULL;
#define N2MINUSONE(n) ((MAX_UINT64>>(64-(n))))
Aviral Dasgupta
Of the methods I've seen, I like this the best. Deterministic. No branching. Small footprint. Portable. Original thinking. Kudos!
Sparky
Unfortunately it gives a wrong result for n=0. Your lookup table has 65 entries (not 64) and that's correct because I asked for results from 2^0-1 up to 2^64-1. The right operand shift has only 64 possible meaningful values. That's why I (now) think that a solution with a single shift is not possible.
Ludwig Weinzierl
@ludwig : Looking at the comments above, I noticed that Cristoph's solution is exactly the same, apart from the fact that he's special casing 0.
Aviral Dasgupta
@Aviral Dasgupta: You left out the special case (which is important) but otherwise you are right. It's the fastest (except for the LUT) and probably most elegant solution so far.
Ludwig Weinzierl
+2  A: 

I think the issue you are seeing is caused because (1<<n)-1 is evaluated as (1<<(n%64))-1 on some chips. Especially if n is or can be optimized as a constant.

Given that, there are many minor variations you can do. For example:

((1ULL<<(n/2))<<((n+1)/2))-1;

You will have to measure to see if that is faster then special casing 64:

(n<64)?(1ULL<<n)-1:~0ULL;
drawnonward
@drawnonward: (1<<(64%64))-1=(1<<0)-1=0, Hmm, could be right. Works and is nearly as fast as the '64 special case' solution, I measured 8.3s.
Ludwig Weinzierl
+3  A: 
Norman Ramsey
+2  A: 

I like aviraldg answer best. Just to get rid of the `ULL' stuff etc in C99 I would do

static inline uint64_t n2minusone(unsigned n) {
   return n ? (~(uint64_t)0) >> (64u - n) : 0;
}

To see that this is valid

  • an uint64_t is guaranteed to have a width of exactly 64 bit
  • the bit negation of that `zero of type uint64_t' has thus exactly 64 one bits
  • right shift of an unsigned value is guaranteed to be a logical shift, so everything is filled with zeros from the left
  • shift with a value equal or greater to the width is undefined, so yes you have to do at least one conditional to be sure of your result
  • an inline function (or alternatively a cast to uint64_t if you prefer) makes this type safe; an unsigned long long may well be an 128 bit wide value in the future
  • a static inline function should be seamlessly inlined in the caller without any overhead
Jens Gustedt
Nice explanation and I agree.While aviraldg supplied the LUT solution your solution is basically Christoph's comment to Gabe's answer.It's also the second fastest after aviraldgs LUT.
Ludwig Weinzierl
Sure it must be slower, since it has the extra conditional. Can't compare the speed of a correct version and an incorrect one, can't you?BTW, your speed measures, I didn't see anything on this page where you define it, what compiler, cpu etc you use. The results will be much dependent on this.
Jens Gustedt
aviraldgs LUT solution is also correct, so the comparison is valid.
Ludwig Weinzierl
+2  A: 

It is true that in C each bit-shifting operation has to shift by less bits than there are bits in the operand (otherwise, the behavior is undefined). However, nobody prohibits you from doing the shift in two consecutive steps

r = ((1ULL << (n - 1)) << 1) - 1;

I.e. shift by n - 1 bits first and then make an extra 1 bit shift. In this case, of course, you have to handle n == 0 situation in a special way, if that is a valid input in your case.

In any case, it is better than your for cycle. The latter is basically the same idea but taken to the extreme for some reason.

AndreyT