views:

572

answers:

4

An interesting problem I've been pondering the past few days is how to copy one integer's bits into another integer at a given position in the destination integer. So, for example, given the destination integer 0xdeadbeef and the source integer 0xabcd, the idea would be to get a result of 0xabcdbeef (given a destination position of 16 bits) or 0xdeabcdef (given a destination position of 8 bits).

With the arbitrary limitation of avoiding conditionals or loops (allowing myself to use just mathematical/bitwise operations), I developed the following function (C++)

int setbits(int destination, int source, int at, int numbits)
{
    int ones = ((1<<(numbits))-1)<<at;
    return (ones|destination)^((~source<<at)&ones);
}

where at is the place where the source bits should be copied into the destination number (0-31) and numbits is the number of bits being copied from source (1-32). As far as I can tell, this algorithm works for all values except for at = 0 and numbits = 32 (the case when the entire destination integer is being overwritten by the source integer) due to the fact that 1<<32 results in 1 (since the shift wraps around) as opposed to 0.

My questions are:

  1. How is this normally done? Are there any particularly notable algorithms used (by notable, I'm asking if there are any particularly efficient tricks that can be used to do this)?
  2. Does my algorithm work as well as I think it does (that is, works for all values except at = 0 and numbits = 32)?
  3. Related to 1), is there any way to do this only using mathematical/bitwise operators? The algorithm for all values is trivial using conditions or loops, so I'm not interested in that.

Algorithm design is usually a weak point for me, so I have no idea whether or not my algorithm is 'as good as it gets' when only using mathematical/bitwise operations. Thanks

A: 

I think it hardly could be more efficient. Moreover, bitwise operations are much faster than any algebraic operations.

Havenard
I think you are wrong on both points. It can be more efficient (see my answer) and at least addition and subtraction are performed with the same speed as bitwise operations on every arch I know.
drhirsch
You exemple is broken.
Havenard
+2  A: 

It pretty good: I tried this alternate version, but yours was about 30% faster in testing:

    int[] bits = new int[] {0,1,3,7,15,31,63,127,255,511,1023
        ,2047,4095,8192,16383,32767,65535,131071,262143,524287
        ,1048575,2097151,4194303,8388607,16777215,33554431,67108863
        ,134217727,268435455,536870911,1073741823,2147483647,-1};

    public int setbits2(int destination, int source, int at, int numbits)
    {
        int ones = bits[numbits + at] & ~bits[at];
        return (destination & ~ones) | ((source << at) & ones);
    }
RBarryYoung
+3  A: 

I don't think it's the case that 1<<32 wraps (otherwise, why doesn't 2<<31 also wrap?), instead I think that internally modulus 32 is applied to the second operator, so that 1<<32 is actually equivalent to 1<<0. Also, consider changing the parameters types from "int" to "unsigned int". To get the value of "ones" without running into the "1<<32" problem, you can do this:

unsigned int ones = (0xffffffff >> (32-numbits)) << at;

I don't believe there are any "standard" methods for this kind of operation. I'm sure there are other ways of using bitwise operators in different ways to achieve the same outcome, but your algorithm is as good as any.

Having said that, though, maintainability and documentation is also important. Your function would benefit from the algorithm being documented with a comment, especially to explain how you use the bitwise XOR -- which is clever, but not easy to understand at first glance.

Todd Owen
That substitutes a problem where `numbits == 0` for the one where `numbits == 32`, but since that doesn't make a whole lot of sense anyway it could be excluded from the allowed domain of the function.
caf
You're right, "wrap" was probably the wrong choice of words. I was meaning to refer to that internal modulus operation that is performed (that you mention).
GRB
This solution is architecture dependant. Replace 0xffffffff by ~0 (no cost, compile time) and 32 by a macro INT_SIZE defined for the architectures you want to support.
fnieto
~0 is indeed slick. Here's a calcuation for "ones" which achieves portability without even using INT_SIZE: ~(~0 << numbits) << at;
Todd Owen
I'm afraid not @Todd, when you do (~0 << 32) you get 0xffffffff
fnieto
Oh right, that is so annoying! Is this behaviour of the bitwise shift operations standardized for C, or does it depend on SHL/SHR operations on the underlying architecture?? This will affect the boundary cases for not only *numbits*, but for *at* too (although that's probably more acceptable, since *at* > 31 really doesn't make any sense).
Todd Owen
@Todd: Haha, that's funny to see you make exactly the same mistake as you corrected me on (shows how unintuitive it is!). As far as the standard goes, 5.8/1 says the result is undefined if the right operand of a byteshift is negative or greater than the number of bits in the left operand.
GRB
+3  A: 

I don't think it can be done more efficient unless you write assembler.

You can improve the readability and solve your overflow problem changing some little things:

int setbits2(int destination, int source, int at, int numbits)
{
    // int mask = ((1LL<<numbits)-1)<<at; // 1st aproach
    int mask = ((~0u)>>(sizeof(int)*8-numbits))<<at; // 2nd aproach
    return (destination&~mask)|((source<<at)&mask);
}

More efficient assembler version (VC++):

// 3rd aproach
#define INT_SIZE 32;
int setbits3(int destination, int source, int at, int numbits)
{ __asm {
    mov ecx, INT_SIZE
    sub ecx, numbits
    or eax, -1
    shr eax, cl
    mov ecx, at
    shl eax, cl // mask == eax
    mov ebx, eax
    not eax
    and eax, destination
    mov edx, source
    shl edx, cl
    and edx, ebx
    or eax, edx
}}
  • 1st aproach: Slower on 32bit architecture
  • 2nd aproach: (~0u) and (sizeof(int)*8) are calculated at compile time, so they don't charge any cost.
  • 3rd aproach: You save 3 ops (memory accesses) writing it in assembler but you will need to write ifdefs if you want to make it portable.
fnieto
On a 32 bit architecture this comes with a performance hit. He didn't ask for a code beautyfier or for solving "the overflow problem" which might not even be there if the funtion is only used for numbits<32.He did ask for a faster version.
drhirsch
First, did you even read my first sentence?? Second, where did you read the function is only used for numbits<32? "numbits is the number of bits being copied from source (1-32)". And third, readability is fundamental when you write algorithms, and was just an advise. It's true that on a 32bit architecture has a little perfomance hit, but contrary to accepted solution it is architecture portable.
fnieto
I update the post with an optimized solution (2nd aproach)
fnieto
@fnieto: sizeof(int)/sizeof(destination)/sizeof(source) would also work and not require the ACE header
GRB
@GRB: you are right, in this case I think sizeof will always evaluate at compile-time. So I edit to change ACE_SIZEOF_INT to sizeof(int). Anyway would be good if somebody confirm this is not compiler dependant. In a first look to the standard I don't find it.
fnieto
5.19/1 guarantees sizeof is a constant expression (and therefore I think that implies it is evaluated at compile-time, see 3.6.2/1 for that)
GRB
Yes, only in C99 there is a case (VLA) where sizeof does not evaluate at compile time. Have a look to litb answer here: http://stackoverflow.com/questions/671790/how-does-sizeofarray-work
fnieto