views:

57

answers:

2

In the Kademlia protocol node IDs are 160 bit numbers. Nodes are stored in buckets, bucket 0 stores all the nodes which have the same ID as this node except for the very last bit, bucket 1 stores all the nodes which have the same ID as this node except for the last 2 bits, and so on for all 160 buckets.

What's the fastest way to find which bucket I should put a new node into?

I have my buckets simply stored in an array, and need a method like so:

Bucket[] buckets; //array with 160 items

public Bucket GetBucket(Int160 myId, Int160 otherId)
{
    //some stuff goes here
}

The obvious approach is to work down from the most significant bit, comparing bit by bit until I find a difference, I'm hoping there is a better approach based around clever bit twiddling.

Practical note: My Int160 is stored in a byte array with 20 items, solutions which work well with that kind of structure will be preferred.

+1  A: 

For starters you could compare byte-by-byte (or word-by-word), and when you find a difference search within that byte (or word) for the first bit of difference.

It seems vaguely implausible to me that adding a node to an array of buckets will be so fast that it matters whether you do clever bit-twiddling to find the first bit of difference within a byte (or word), or just churn in a loop up to CHAR_BIT (or something). Possible, though.

Also, if IDs are essentially random with uniform distribution, then you will find a difference in the first 8 bits about 255/256 of the time. If all you care about is average-case behaviour, not worst-case, then just do the stupid thing: it's very unlikely that your loop will run for long.

For reference, though, the first bit of difference between numbers x and y is the first bit set in x ^ y. If you were programming in GNU C, __builtin_clz might be your friend. Or possibly __builtin_ctz, I'm kind of sleepy...

Your code looks like Java, though, so I guess the bitfoo you're looking for is integer log.

Steve Jessop
C#, close to java. As you say, speed of this algorithm probably isn't important, I mostly asked the question out of interest. Previous questions about bit twiddling have turned up some clever tricks which were fun to unravel how they worked
Martin
+2  A: 

Would you consider be willing to consider an array of 5 32-bit integers? (or 3 64-bit integers)? Working with whole words may give you better performance than working with bytes, but the method should work in any case.

XOR the corresponding words of the two node IDs, starting with the most significant. If the XOR result is zero, move on to the next most significant word.

Otherwise, find the most significant bit that is set in this XOR result using the constant time method from Hacker's Delight.. This algorithm results in 32 (64) if the most significant bit is set, and 1 if the least significant bit is set, and so on. This index, combined with the index of the current word, will will tell you which bit is different.

erickson
Bytes seems like the most natural way to do things, as you say integers will probably be faster
Martin