ansaurus

Question

How to calculate 2^n-1 efficiently without overflow?

Answer 1

+29 A:

Use a lookup table. (Generated by your present code.) This is ideal, since the number of values is small, and you know the results already.

/* magic stuff -- do not touch! */
const static uint64_t N2MINUSONE_LUT[] = {
0x0,
0x1,
0x3,
0x7,
0xf,
0x1f,
0x3f,
0x7f,
0xff,
0x1ff,
0x3ff,
0x7ff,
0xfff,
0x1fff,
0x3fff,
0x7fff,
0xffff,
0x1ffff,
0x3ffff,
0x7ffff,
0xfffff,
0x1fffff,
0x3fffff,
0x7fffff,
0xffffff,
0x1ffffff,
0x3ffffff,
0x7ffffff,
0xfffffff,
0x1fffffff,
0x3fffffff,
0x7fffffff,
0xffffffff,
0x1ffffffff,
0x3ffffffff,
0x7ffffffff,
0xfffffffff,
0x1fffffffff,
0x3fffffffff,
0x7fffffffff,
0xffffffffff,
0x1ffffffffff,
0x3ffffffffff,
0x7ffffffffff,
0xfffffffffff,
0x1fffffffffff,
0x3fffffffffff,
0x7fffffffffff,
0xffffffffffff,
0x1ffffffffffff,
0x3ffffffffffff,
0x7ffffffffffff,
0xfffffffffffff,
0x1fffffffffffff,
0x3fffffffffffff,
0x7fffffffffffff,
0xffffffffffffff,
0x1ffffffffffffff,
0x3ffffffffffffff,
0x7ffffffffffffff,
0xfffffffffffffff,
0x1fffffffffffffff,
0x3fffffffffffffff,
0x7fffffffffffffff,
0xffffffffffffffff,
};

Aviral Dasgupta 2010-06-13 16:24:00

This is actually quite useful, since the values are known ahead of time.

stw_dev 2010-06-13 16:26:19

might look less mysterious if values were defined in hex

cobbal 2010-06-13 16:43:57

Wow, thanks for the table. I did a quick test and it's almost twice as fast as the "special case n=64" solution (3.9s for my specific problem).

Ludwig Weinzierl 2010-06-13 17:16:46

And make the table static const, since it's not going to change. :)

unwind 2010-06-13 17:42:31

The `ull` suffix is unnecessary for hexadecimal integer literals.

James McNellis 2010-06-13 17:49:15

When is the `ull` suffix necessary?

Gabe 2010-06-13 18:49:55

67 lines of code instead of 1? If a LUT is necessary for speed (which is possible, but the effect of and on the cache is hard to measure), why is statically declaring it better than initialising with code?

Paul Hankin 2010-06-13 20:46:42

@paul : It isn't. Nor have I said so. The OP's asked for speed and elegance. Those don't exactly come packaged with readability.

Aviral Dasgupta 2010-06-13 20:49:41

might I suggest as a comment `/* lookup table: n -> 2^n-1 -- do not touch */` instead.

wds 2010-06-16 08:17:56

@wds That comment was intended as a play on "Do not erase". Of course, you could modify it in actual code.

Aviral Dasgupta 2010-06-17 09:00:26

Answer 2

+7 A:

if (n > 64 || n < 0)
  return undefined...

if (n == 64)
  return 0xFFFFFFFFFFFFFFFFULL;

return (1ULL << n) - 1;

brickner 2010-06-13 16:28:19

That sure is a `FULL` 64 bit integer :P (I usually change the case of the suffix from the hex number to eliminate confusion)

Billy ONeal 2010-06-14 21:57:21

Answer 3

+24 A:

How about a simple r = (n == 64) ? -1 : (1ULL<<n)-1;?

Gabe 2010-06-13 16:30:07

I like that, it's concise, easy to understand and reasonably fast (7.0 s for my specific problem).

Ludwig Weinzierl 2010-06-13 17:23:35

`n ? ~0ull >> (64 - n) : 0ull` is equivalent and *might* be faster because there's only a single 64bit op

Christoph 2010-06-13 21:10:16

Pedantically, I prefer ~0 to -1, since -1 implies (to me, not the compiler) that you're working with signed integers.

Steve314 2010-06-13 22:32:27

@Christoph you're a hacker. It's 5.9s vs 7.0s for the original.

Ludwig Weinzierl 2010-06-14 18:22:35

Answer 4

+4 A:

The only problem is that your expression isn't defined for n=64? Then special-case that one value.

(n == 64 ? 0ULL : (1ULL << n)) - 1ULL

Paul Hankin 2010-06-13 16:38:57

Answer 5

+11 A:

If you want to get the max value just before overflow with a given number of bits, try

r=(1ULL << n-1)+((1ULL<<n-1)-1);

By splitting the shift into two parts (in this case, two 63 bit shifts, since 2^64=2*2^63), subtracting 1 and then adding the two results together, you should be able to do the calculation without overflowing the 64 bit data type.

David Lively 2010-06-13 16:39:23

David, you win a special price for the most elegant solution. It's a little bit slower than the "special case n=64" solution (9.2s for my specific problem).

Ludwig Weinzierl 2010-06-13 17:27:11

@Ludwig, you might be able to speed it up by calculating 1ULL<<n-1 once in a temporary, if the compiler isn't doing it for you already.

Mark Ransom 2010-06-14 21:38:08

Answer 6

+4 A:

Shifting 1 << 64 in a 64 bit integer yields 0, so no need to compute anything for n > 63; shifting should be enough fast

r = n < 64 ? (1ULL << n) - 1 : 0;

But if you are trying this way to know the max value a N bit unsigned integer can have, you change 0 into the known value treating n == 64 as a special case (and you are not able to give a result for n > 64 on hardware with 64bit integer unless you use a multiprecision/bignumber library).

Another approach with bit tricks

~-(1ULL << (n-1) ) | (1ULL << (n-1))

check if it can be semplified... of course, n>0

EDIT

Tests I've done

__attribute__((regparm(0))) unsigned int calcn(int n)
{
  register unsigned int res;
  asm(
    "  cmpl $32, %%eax\n"
    "  jg   mmno\n"
    "  movl $1, %%ebx\n"      // ebx = 1
    "  subl $1, %%eax\n"      // eax = n - 1
    "  movb %%al, %%cl\n"     // because of only possible shll reg mode
    "  shll %%cl, %%ebx\n"    // ebx = ebx << eax
    "  movl %%ebx, %%eax\n"   // eax = ebx
    "  negl %%ebx\n"          // -ebx
    "  notl %%ebx\n"          // ~-ebx
    "  orl  %%ebx, %%eax\n"   // ~-ebx | ebx
    "  jmp  mmyes\n"
    "mmno:\n"
    "  xor %%eax, %%eax\n"
    "mmyes:\n"
    :
    "=eax" (res):
    "eax" (n):
    "ebx", "ecx", "cc"
    );
  return res;
}

#define BMASK(X) (~-(1ULL << ((X)-1) ) | (1ULL << ((X)-1)))
int main()
{
  int n = 32; //...
  printf("%08X\n", BMASK(n));
  printf("%08X %d %08X\n", calcn(n), n&31, BMASK(n&31));
  return 0;
}

Output with n = 32 is -1 and -1, while n = 52 yields "-1" and 0xFFFFF, casually 52&31 = 20 and of course n = 20 gives 0xFFFFF...

EDIT2 now the asm code produces 0 for n > 32 (since I am on a 32 bit machine), but at this point the a ? b : 0 solution with the BMASK is clearer and I doubt the asm solution is too much faster (if speed is a so big concern the table idea could be the faster).

ShinTakezou 2010-06-13 17:33:36

Thanks for your solution Shin. The problem is that 1<<64 is not 0. If 1<<64 was 0 everything would be fine, because 0ULL-1ULL=18446744073709551615ULL in modular arithmetic. This is even standard behaviour and you and rely on this. The problem is that the result of 1<<64 is implementation dependant. On my machine, with my compiler 1ULL<<64 results in 1.

Ludwig Weinzierl 2010-06-13 17:52:15

a `n<64 ? calcn(n) : 0` would solve anyway then? (with calcn(n) anything from asm code to ~-(A)|(A) or alike)?

ShinTakezou 2010-06-13 19:03:25

@ShinTakezou: As promised I had a look at your code. I don't get why you do ~-(1ULL << (n-1) ) | (1ULL << (n-1)). I seems overly complicated while it doesn't solve the original problem (that I have to special case either 0 or 64). Am I missing something?

Ludwig Weinzierl 2010-06-14 21:13:26

ShinTakezou 2010-06-15 12:41:06

P.S. yes it's overly complicated as expression, it seems simpler `n>=64 ? 0 : (1ULL << n)-1`; but here a real subtraction must be performed, while "negating" could be a little bit more performant of subtracting, in asm, __but__ `neg r0; not r0; or r1, r1, r0` instead of `sub r1, r1, 1`, and there's a n-1 anyway, and the shift also (in common), so my solution has anyway more instructions and so it is yes, overly complicated and not efficient :( (but ~-A|A looks very fine to me:)

ShinTakezou 2010-06-15 12:48:31

Answer 7

+3 A:

Since you've asked for an elegant way to do it:

const uint64_t MAX_UINT64 = 0xffffffffffffffffULL;
#define N2MINUSONE(n) ((MAX_UINT64>>(64-(n))))

Aviral Dasgupta 2010-06-13 20:45:24

Of the methods I've seen, I like this the best. Deterministic. No branching. Small footprint. Portable. Original thinking. Kudos!

Sparky 2010-06-14 15:50:01

Unfortunately it gives a wrong result for n=0. Your lookup table has 65 entries (not 64) and that's correct because I asked for results from 2^0-1 up to 2^64-1. The right operand shift has only 64 possible meaningful values. That's why I (now) think that a solution with a single shift is not possible.

Ludwig Weinzierl 2010-06-14 18:10:03

@ludwig : Looking at the comments above, I noticed that Cristoph's solution is exactly the same, apart from the fact that he's special casing 0.

Aviral Dasgupta 2010-06-14 22:17:45

@Aviral Dasgupta: You left out the special case (which is important) but otherwise you are right. It's the fastest (except for the LUT) and probably most elegant solution so far.

Ludwig Weinzierl 2010-06-14 22:31:36

Answer 8

+2 A:

I think the issue you are seeing is caused because (1<<n)-1 is evaluated as (1<<(n%64))-1 on some chips. Especially if n is or can be optimized as a constant.

Given that, there are many minor variations you can do. For example:

((1ULL<<(n/2))<<((n+1)/2))-1;

You will have to measure to see if that is faster then special casing 64:

(n<64)?(1ULL<<n)-1:~0ULL;

drawnonward 2010-06-13 21:02:25

@drawnonward: (1<<(64%64))-1=(1<<0)-1=0, Hmm, could be right. Works and is nearly as fast as the '64 special case' solution, I measured 8.3s.

Ludwig Weinzierl 2010-06-14 17:52:02

Answer 9

+3 A:

Norman Ramsey 2010-06-13 22:25:46

Answer 10

+2 A:

I like aviraldg answer best. Just to get rid of the `ULL' stuff etc in C99 I would do

static inline uint64_t n2minusone(unsigned n) {
   return n ? (~(uint64_t)0) >> (64u - n) : 0;
}

To see that this is valid

an uint64_t is guaranteed to have a width of exactly 64 bit
the bit negation of that `zero of type uint64_t' has thus exactly 64 one bits
right shift of an unsigned value is guaranteed to be a logical shift, so everything is filled with zeros from the left
shift with a value equal or greater to the width is undefined, so yes you have to do at least one conditional to be sure of your result
an inline function (or alternatively a cast to uint64_t if you prefer) makes this type safe; an unsigned long long may well be an 128 bit wide value in the future
a static inline function should be seamlessly inlined in the caller without any overhead

Jens Gustedt 2010-06-14 21:05:35

Nice explanation and I agree.While aviraldg supplied the LUT solution your solution is basically Christoph's comment to Gabe's answer.It's also the second fastest after aviraldgs LUT.

Ludwig Weinzierl 2010-06-14 21:22:54

Sure it must be slower, since it has the extra conditional. Can't compare the speed of a correct version and an incorrect one, can't you?BTW, your speed measures, I didn't see anything on this page where you define it, what compiler, cpu etc you use. The results will be much dependent on this.

Jens Gustedt 2010-06-14 22:39:11

aviraldgs LUT solution is also correct, so the comparison is valid.

Ludwig Weinzierl 2010-06-20 21:55:13

Answer 11

+2 A:

It is true that in C each bit-shifting operation has to shift by less bits than there are bits in the operand (otherwise, the behavior is undefined). However, nobody prohibits you from doing the shift in two consecutive steps

r = ((1ULL << (n - 1)) << 1) - 1;

I.e. shift by n - 1 bits first and then make an extra 1 bit shift. In this case, of course, you have to handle n == 0 situation in a special way, if that is a valid input in your case.

In any case, it is better than your for cycle. The latter is basically the same idea but taken to the extreme for some reason.

AndreyT 2010-06-14 22:04:04

ansaurus

tags:

views:

answers:

How to calculate 2^n-1 efficiently without overflow?

related questions