views:

67

answers:

2

My question is rather complicated for me to explain, as i'm not really good at maths, but i'll try to be as clear as possible.

I'm trying to code a cluster in python, which will generate words given a charset (i.e. with lowercase: aaaa, aaab, aaac, ..., zzzz) and make various operations on them. I'm searching how to calculate, given the charset and the number of nodes, what range each node should work on (i.e.: node1: aaaa-azzz, node2: baaa-czzz, node3: daaa-ezzz, ...). Is it possible to make an algorithm that could compute this, and if it is, how could i implement this in python?

I really don't know how to do that, so any help would be much appreciated

+1  A: 

Any way that you could compute a small integer from the string would be fine for clustering. For example, compute a hash with md5, and look at a byte of it:

import hashlib

s = "aaac"
num_nodes = 5 # or whatever
m = hashlib.md5(s)
node = ord(m.digest()[0]) % num_nodes

print node  # prints 2

This won't guarantee to evenly distribute all the strings, but it will be close.

Ned Batchelder
i can't understand how to use your piece of code to know the "words range" to give to each node, could you be more precise please?
MatToufoutu
This code doesn't know anything about word ranges. It takes a word (s in the sample), and determines a node number to assign it to. The assignment isn't based on ranges, it's apparently random.
Ned Batchelder
ok, that's not what i'm looking for, but thanks anyway
MatToufoutu
+1  A: 

You should be able to treat your words as numerals in a strange base. For example, let's say you have a..z as your charset (26 characters), 4 character strings, and you want to distribute among equally 10 machines. Then there are a total of 26^4 strings, so each machine gets 26^4/10 strings. The first machine will get strings 0 through 26^4/10, the next 26^4/10 through 26^4/5, etc.

To convert the numbers to strings, just write the number in base 26 using your charset as the numbers. So 0 is 'aaaa' and 26^4/10 = 2*26^3 + 15*26^2 + 15*26 +15 is 'cppp'.

Keith Randall
thanks a lot, i'll see if i can do what i want using this method
MatToufoutu