views:

236

answers:

6

If I have a set of values (which I'll call x), and a number of subsets of x:

What is the best way to work out all possible combinations of subsets whose union is equal to x, but none of whom intersect with each other.

An example might be:

if x is the set of the numbers 1 to 100, and I have four subsets:

  • a = 0-49
  • b = 50-100
  • c = 50-75
  • d = 76-100

then the possible combinations would be:

  • a + b
  • a + c + d
A: 

A way (may not be the best way) is:

  1. Create a set of all the pairs of subsets which overlap.
  2. For every combination of the original subsets, say "false" if the combination contains one or more of the pairs listed in Step 1, else say "true" if the union of the subsets equals x (e.g. if the total number of elements in the subsets is x)
ChrisW
+10  A: 

What you describe is called the Exact cover problem. The general solution is Knuth's Algorithm X, with the Dancing Links algorithm being a concrete implementation.

Nick Johnson
A: 

The actual algorithm seems largely dependent on the choice of subsets, product operation, and equate operation. For addition (+), it seems like you could find a summation to suit your needs (the sum of 1 to 100 is similar to your a + b example). If you can do this, your algorithm is obviously O(1).

If you have a tougher product or equate operator (let's say taking a product of two terms means summing the strings and finding the SHA-1 hash), you may be stuck doing nested loops, which would be O(n^x) where x is the number of terms/variables.

Jon Smock
A: 

Depending on the subsets you have to work with, it might be advantageous to use a more naive algorithm. One where you don't have to compare the entire subset, but only upper and lower bounds.

If you are talking random subsets, not necesserily a range, then Nick Johnson's suggestion will probably be the best choice.

Yannick M.
+1  A: 

Given a well-order on the elements of x (make one up if necessary, this is always possible for finite or countable sets):

Let "sets chosen so far" be empty. Consider the smallest element of x. Find all sets which contain x and which do not intersect with any of the sets chosen so far. For each such set in turn recurse, adding the chosen set to "sets chosen so far", and looking at the smallest element of x not in any chosen set. If you reach a point where there is no element of x left, then you've found a solution. If you reach a point where there is no unchosen set containing the element you're looking for, and which does not intersect with any of the sets that you already have selected, then you've failed to find a solution, so backtrack.

This uses stack proportional to the number of non-intersecting subsets, so watch out for that. It also uses a lot of time - you can be far more efficient if, as in your example, the subsets are all contiguous ranges.

Steve Jessop
+1  A: 

here's a bad way (recursive, does a lot of redundant work). But at least its actual code and is probably halfway to the "efficient" solution.

def unique_sets(sets, target):
    if not sets and not target:
        yield []
    for i, s in enumerate(sets):
        intersect = s.intersection(target) and not s.difference(target)
        sets_without_s = sets[:i] + sets[i+1:]
        if intersect:
            for us in unique_sets(sets_without_s, target.difference(s)):
                yield us + [s]
        else:
            for us in unique_sets(sets_without_s, target):
                yield us

class named_set(set):
    def __init__(self, items, name):
        set.__init__(self, items)
        self.name = name

    def __repr__(self):
        return self.name

a = named_set(range(0, 50), name='a')
b = named_set(range(50, 100), name='b')
c = named_set(range(50, 75), name='c')
d = named_set(range(75, 100), name='d')

for s in unique_sets([a,b,c,d], set(range(0, 100))):
    print s
zzzeek