I'm looking for implementations of set reconciliation algorithm. The problem is following: there are two sets with elements identified by some relatively compact value (e.g. UUID or MD5/SHA1/whatever hash) sitting on different machines. These sets differ in relatively few elements and I want to synchronize these sets while transferring minimal amount of data. Most of googling leads here. This is GPL'd implementation of what seems to be the state-of-art approach to the task. The problem is that I can't use GPL'd code in my app. Most likely I'll have to reimplement it myself using something like nzmath, but maybe there are other implementations (preferably Python or C/C++), or maybe there are other nicer algorithms?
+1
A:
Not being able to use GPL is often a matter of abstraction; that is if it is the license you have problems with. So if you create a small GPL application (released under GPL) you can call this from your non-GPL application. Why re-invent the wheel?
Especially if you can use a python script which already exists: why not leverage it? Of course things are different if you can not expose the element reconsolidation algorithms.
Adriaan
2009-09-22 11:28:12
I've considered doing this, but I somehow feel bad about such GPL workaround as resulting GPL'd application will be rather useless without it's unGPL'd wrapper.
fionbio
2009-09-22 12:11:09
it would still comply with GPL. Other people may re-use ideas, so it still fits the GPL ideas. It beats re-inventing the wheel imho.
Adriaan
2009-09-22 13:35:15
A:
This code is out of my head, and thus covered by whatever license applies for code samples in this site.
# given two finite sequences of unique and hashable data,
# return needed opcodes and data needed for reconciliation
def set_reconcile(src_seq, dst_seq):
"Return required operations to mutate src_seq into dst_seq"
src_set= set(src_seq) # no-op if already of type set
dst_set= set(dst_seq) # ditto
for item in src_set - dst_set:
yield 'delete', item
for item in dst_set - src_set:
yield 'create', item
Use as follows:
for opcode, datum in set_reconcile(machine1_stuff, machine2_stuff):
if opcode == 'create':
# act accordingly
elif opcode == 'delete':
# likewise
else:
raise RuntimeError, 'unexpected opcode'
ΤΖΩΤΖΙΟΥ
2009-10-21 21:13:57
Ok, but you don't have both sets on any of machines, that's the problem.
fionbio
2009-10-22 13:08:16
If you can't transfer the list of keys from one machine, how will you transfer the items? Please re-specify your objection (if it is one). You asked for non-GPL code to reconcile two sets, I didn't see a request for code that manages transfers to and fro, transferred items being either unique IDs or items.
ΤΖΩΤΖΙΟΥ
2009-10-22 18:09:30
The thing is, if you have two LARGE sets of keys such as UUIDs sitting on different machines, a trick exists that allows one to get e.g. union of these sets on both sides without passing too much data (see link in the question). What happens after full key lists are obtained on both sides is trivial and thus of little interest for me.
fionbio
2009-10-27 16:00:37
A:
This doesn't help much in terms of avoiding the GPL, but there is another implementation that is part of the SKS replicated OpenPGP keyserver:
http://code.google.com/p/sks-keyserver/
This one is in OCaml.
zrr
2010-02-20 23:50:15