views:

139

answers:

10

Given two bytes, how would I find the length of the common bits at the start of the two bytes.

For example:

9 == 00001001
6 == 00000110

Common prefix is 0000, length 4

I'm working in C#, so please stick to C# operations only.

Addendum: This particular piece of code will run thousands of times and needs to be very fast.

+2  A: 

If you're in a space-limited environment (which obviously you're not if you're using C#, but just in general) and can't afford a lookup table:

byte test = byte1 ^ byte2;
int length = 0;
if ((test & 0x80) == 0)
{
    if ((test & 0x40) == 0)
    {
        if ((test & 0x20) == 0)
        {
            if ((test & 0x10) == 0)
            {
                // I think you get the idea by now.
                // Repeat for the lower nibble.
            }
            else
                length = 3;
        }
        else
            length = 2;
    }
    else
        length = 1;
}

This is basically an unraveled loop to find the first 1 bit in the XOR'd number. I don't think it can get any faster than this without the lookup table.

jdmichal
Upvote for the space limited solution, which isn't what I need but it's interesting nonetheless
Martin
A: 
int i;
for (i=0;i<sizeof(byte);i++)
    if (a >> sizeof(byte)-i != b >> sizeof(byte)-i) break;
Mark H
+6  A: 
byte x = 9;
byte y = 6;

while ( x != y )
{
    x >>= 1;
    y >>= 1;
}

Basically, remove a bit from the right of each number until the two are equal. When they become equal, their bits are equal too.

You can keep track of the length of the prefix easily by introducing another variable. I'll leave that to you.

If you want it to be fast, and considering that you're dealing with bytes, why not precompute the values and return the answer in a single operation? Run this algorithm for all possible combinations of two bytes and store the result in a table.

You only have 2^8 * 2^8 = 2^16 possibilities (2^15 actually, because x = 6 and y = 9 is the same as x = 9 and y = 6). If you can afford the initial time and memory, precomputation should be fastest in the end.

Edit:

You got a solution that's at least better for precomputation and probably faster in general: find the leftmost 1 bit in x ^ y. Using this, build a table Pre where Pre[i] = position of leftmost 1 bit in i. You only need 2^8 bytes for this table.

IVlad
I find this approach the most readable.
mquander
This loop may never terminate.
Mike Daniels
Ooooo. I like it. And if `byte` or `uint` is used, it will terminate.
jdmichal
Why would it not terminate? If the two numbers have no bits in common, it would terminate once x == y == 0.
mquander
@mquander: What if the two numbers do not have the same sign?
Mike Daniels
@Mike Daniels - I admit I haven't thought about that, but that shouldn't be a problem for the OP if he's really working with bytes.
IVlad
@Mike: Bytes are unsigned in C#, so it's probably moot, but I admit that's a pretty good point.
mquander
C# does arithmetic shift if the type is signed. Meaning that if one number is negative and one is positive, you will infinite loop with one at all one bits and one at zero. http://msdn.microsoft.com/en-us/library/aa691377%28VS.71%29.aspx
jdmichal
@IVlad, @mquander: Oh right. Unsigned bytes. This is what Java does to your brain. ;P
Mike Daniels
You also only have 2^15 possibilities, since the operation is symmetrical. Just swap if the first argument is bigger or something similar.
jdmichal
Precomputation seems like the best idea, I should have thought of that! Also, this seems like the most reasonable "realtime" approach too
Martin
You're missing a semicolon; and as I explained on my post, you don't need 2^16 entries, just 2^8.
GameZelda
@GameZelda - you can indeed do it with 2^8 entries, but definitely not as you explained. You can do it with a xor table.
IVlad
+1  A: 

Here's a procedural way:

int r = 8;
while (a != b)
{
    a >>= 1;
    b >>= 1;
    r -= 1;
}

Here's a way that uses a lookup table with just 256 entries:

int[] lookupTable;

void createLookupTable()
{
    lookupTable = new int[256];
    for (int a = 0; a <= 255; ++a)
    {
        int n = 8;
        byte b = (byte)a;
        while (b > 0) {
            b >>= 1;
            n -= 1;
        }
        lookupTable[a] = n;
    }
}

int commonPrefix(byte a, byte b)
{
    return lookupTable[a ^ b];
}

And just for fun here's a way to do it with LINQ:

int r = 8 - Enumerable.Range(0, 9).Where(n => a >> n == b >> n).First();
Mark Byers
Points for the LINQ implementation, we all love LINQ :D
Martin
+3  A: 

EDIT: Thanks to the comments, I found that I misunderstanded the problem. (Below is a fixed version).

With a lookup table:

readonly static int[] bytePrefix = new int[] {
    8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
    3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};

And use it XORing the two bytes:

bytePrefix[9 ^ 6]

I believe this is as fast as it can get, it's just one XOR operation and an array lookup (you can also change it to 2 array lookups, but it would use 256 times more memory and probably be slower, bitwise it really fast).

GameZelda
x = 1111 0000, y = 1111 0000. Won't this return 0 if the two bytes are equal?
IVlad
@IVIad: No. ORing those 2 values results in 0b11110000, and the common prefix of this is 0 as you can see in the lookup table.
GameZelda
@GameZelda - I'm pretty sure the common prefix for 1111 0000 and 1111 0000 has length 8. Your program returns 0. 0 is wrong.
IVlad
I'm pretty sure the operation you want here is XOR, not OR. `11110000` and `11110000` has a common prefix of 8... XOR will cause zeroes in the positions they have in common, and works with the table you provided.
jdmichal
Damnit, you're right. looks like I did understand the problem wrong. Going to fix my post now.
GameZelda
There. I can finally up-vote your answer!
jdmichal
Looks like I accepted the otehr answer too soon! I was just thinking about embedding the 255*255 lookup table in my code and crying at the mess it would leave ;)
Martin
How about a byte array? That will take up a fourth of the space so it's more likely that it remains in the memory cache, and the lookup should be slightly faster as the index doesn't have to be multiplied to accomodate for the size of an int.
Guffa
A byte array is a good idea!
Martin
+1  A: 

This can be restated as a simpler problem with a known fast solution:

  • Find the left-most true bit in X ^ Y.

Some code (apparently code can't immediately follow a bulleted list?!?)

 int findCommonPrefix(long x, long y, out long common)
 {
    int prefixPlace = 0;
    int testPlace = 32;
    long w, mismatch = x ^ y;
    do {
       w = mismatch >> testPlace;
       if (w != 0) { prefixPlace |= testPlace; mismatch = w; }
       testPlace >>= 1;
    } while (testPlace != 0);
    common = x >> prefixPlace;
    return 64 - prefixPlace;
 }

This needs only 6 iterations to find the common prefix in a 64-bit long, the byte version will need only 3 iterations. Unroll the loop for even more speed.

Ben Voigt
+1  A: 

Another approach using exclusive or (xor):

public int GetCommonPrefixLength(byte a, byte b)
{
    int c = a ^ b;
    int len = -1;
    while ((++len < 8) && ((c & 0x80) == 0))
        c = c << 1;
    return len;
}
arbiter
Doesn't seem to work if I have understand the problem correctly. For example, a = 255 and b = 255 (both 0b11111111, so common prefix = 0), returns 8.
GameZelda
@GameZelda - just out of curiosity, can you tell me what you think a common prefix is?
IVlad
If a == 255 and b == 255 then common prefix will be 255 (length 8) because a and b equals. Read question carefully author needs common prefix, not zeroed common prefix.
arbiter
+3  A: 

First get the binary difference between the bytes using the xor operator. Then you just shift bits out to the right until the difference is zero:

byte b1 = 6;
byte b2 = 9;

int length = 8;
for (int diff = b1 ^ b2; diff != 0; length--) diff >>= 1;

This will give you a minimum of calculations in the loop, so it will be rather fast.

Guffa
+1, Looks almost perfect.
arbiter
A: 

Here's one without a table or a loop:

len =  (a^b) ? (7 - (int)Math.Log( a^b, 2)) : 8;

Explanation:

log2 X is the power to which the number 2 must be raised to obtain the value X. Since each bit in a binary number represents the next power of 2, you can use this fact to find the highest bit set (counting from 0):

2**0   = 1 = 0b0001;  log2(1) = 0
2**1   = 2 = 0b0010;  log2(2) = 1
2**1.6 =~3 = 0b0011;  log2(3) =~1.6; (int)log2(3) = 1
2**2   = 4 = 0b0100;  log2(4) = 2
...
2**3   = 8 = 0b1000;  log2(8) = 3

So the code works by taking a XOR b, which sets only the bits which are different. If the result is non-zero, we use log2 to find the highest bit set. 7 less the result gives the number of leading zeros = the number of common bits. There is a special case when a XOR b == 0: log2(0) is -Infinity, so that won't work, but we know that all the bits must match, so the answer is 8.

AShelly
Wow, how does that work?
Martin
added explanation
AShelly
A: 
supercat