views:

48

answers:

3

This is related to consistent hashing and while I conceptually understand what I need to do, I'm having a hard time translating this into code.

I'm trying to divide a given keyspace (say, 128 bits) into equal sized partitions. I want the upper bound (highest key) of each partition.

Basically, how would I complete this?

#define KEYSPACE_BYTE_SIZE  16
#define KEYSPACE_BIT_SIZE   (KEYSPACE_BYTE_SIZE * 8)

typedef struct _key
{ 
    char byte[KEYSPACE_BYTE_SIZE];
} key;

key * partition_keyspace( int num_partitions )
{
    key * partitions = malloc( sizeof(key) * num_partitions );

    // ...

}

Edit:

I suppose another way of saying this is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = ((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * i;
}

Of course the problem is 2 ^ 128 is a very large number and can't be contained in any single integer variable in C with which to do the math (hence the char[16] struct).

I really don't want to use a large number library (or any library) for this.

Edit:

Although, in actuality the numbers I'm looking for is:

for (i = 0; i < num_partitions; i++)
{
    partitions[i] = (((2 ^ KEYSPACE_BIT_SIZE) / num_partitions) * (i + 1)) - 1;
}
A: 

I am not sure I understand the context of your question - I've not studied consistent hashing.


The question almost amounts to, "how can I sort without sorting".

Another approach might be to do this:

iter = seed() #initialize to the bottom of the hash keys
for(i = 0 to partitionbound)
{
   iter = nextIter(iter);
}

This is in linear time. However, it requires no a priori knowledge of the key space except that there is some order which nextIter obeys.

If you are partitioning [0, 2^128] -> {values}, e.g., you're doing some distributed computing or whathave you, you're in much better luck, since integers are well-structured.

I would suggest the slightly silly idea of having 4 32-bit ints in a struct and writing your own bigint routine that solves what you need to solve.

If you have the freedom to not use C++, Common Lisp has bigints built in. I've found that handy.


If you have representable keys...

However, when seeking some equally sized k partitions in some space a with n elements, I would approach the problem like this:

if( n % k)
{
   return "not equal-sized partition!"
}
//could be forking/threading, whatever.
for(int i = 0; i < n; i+=k)
{
   process(i, i+k-1);
}


process(bottom, top)
{
   sort(a[bottom], a[top]);
   return a[top]; //you'll have to figure out where to dump the results.
}
Paul Nathan
The space is not in some array or list of items you can manipulate. I only need to know the partitions. It's sort of like saying if you have all four letter words from AAAA to ZZZZ, split them into 10 equal partitions and tell me the last word in each partition. Now do that for bytes instead of letters and KEYSPACE_SIZE_BYTES number of bytes per "word" instead of four.
pbhogan
@pbhogan: (1) an you compute an arbitrary value based upon a given key? (2)I assume that you can perform ordering upon the keys?
Paul Nathan
There are way too many keys to generate them all and then order them. This is not an operation on a set of keys but on the complete keySPACE (all possible keys). For a 128 bit keyspace we're talking about 2 ^ 128 possible keys... I only want the last possible key in each of *n* partitions.
pbhogan
@pbhogan: I understand better now - you are trying to address elements that aren't, technically, addressable directly. :)
Paul Nathan
Right. :) It is doable (see my edit) with some kind of bignum library like *gmplib*, but I'm sure there is a simpler way of doing it.
pbhogan
@pbhogan: can you limit yourself to partitions of powers of 2?
Paul Nathan
Incidentally, the usefulness of this is a case where you place the partitioning keys in some kind of ordered tree (say a red-black tree) and then you can, given a key, find which partition it should be in. Useful in distributed hash tables and such.
pbhogan
Yes, I can impose rules on num_partitions, like it must be a power of 2 or must divide into the keyspace equally (which is probably the same thing in this case).
pbhogan
@pbhogan - updated with my thoughts on an approach.
Paul Nathan
+1  A: 

The highest key in any particular partition will obviously be comprised of all 1-bits. If you have the lower n bits for your keys, and the upper m bits for your partition-ids, then all you need to do is run an m-bit counter, and concatenate it with n ones.
To illustrate, assume an 8-bit keyspace with the upper 2 bits for the partitions (so num_partitions = 2^2 = 4, and the lower 6 for the keys. The highest key in each partition will be these four:

00 111111
01 111111
10 111111
11 111111

In order to generate them, all you need to do is:

for (int i = 0; i < num_partitions; i++)
    highest_key = (i << 6) | 0x3f // where 6 is key_bits and 0x3f is six ones.

Of course, this assumes num_partitions is a power of two.

Naturally, for a key-space as large as yours it won't be as simple as the above, since you can't fit everything into a single variable. Still, the principle remains the same. As long as your num_partitions is small enough, you can fit the counter into an ordinary int variable, copy it into the upper bits, and then filling the rest with ones is trivial.

tzaman
Thanks! That's the key I needed. :)
pbhogan
You're welcome! :)
tzaman
A: 

Based on tzaman's answer, here is my solution. It allows up to 255 partitions (although this could be altered). It does NOT require a power of 2 num_partitions... it'll just make the last partition take up whatever's left.

Let me know if you see any bugs... :)

key * partition_keyspace( unsigned int num_partitions )
{
    assert( num_partitions > 0 );
    assert( num_partitions < 0xFF );

    key * partitions = (key *) malloc( sizeof(key) * num_partitions );

    // fill every bit
    memset( partitions, 0xFF, sizeof(key) * num_partitions );

    // calculate how many bits of the top byte needs to be filled by 1's
    unsigned char fill_bits = 0;
    while (num_partitions > (1 << fill_bits)) fill_bits++;
    fill_bits = 8 - fill_bits;

    // fill the top byte with the base number of 1's
    unsigned char fill_part = 0;
    for (unsigned int i = 0; i < fill_bits; i++) fill_part |= 1 << i;

    // last partition takes up whatever remains, so don't process it (hence the -1)
    for (unsigned char i = 0; i < num_partitions - 1; i++)
    {
        partitions[i].byte[0] = fill_part | (i << fill_bits);
    }

    return partitions;
}
pbhogan