views:

46

answers:

5

Given a matrix of boolean values (example):

    a  b  c
  +--------
X | 1  0  1  
Y | 1  1  1
Z | 0  1  1
  +--------

What's the optimum way to find all sequences [a|b|c] such that each sequence has at least one "true" (1) value for each of X, Y and Z.

In the example above, the set of sequences is (in no particular order):

  abc, ab, ac, bc, c

Whereas these sequences do not satisfy the requirement:

  a (doesn't provide Z)
  b (doesn't provide X)

I'm actually looking for a generalized algorithm for an arbitrary matrix.

Any ideas?

+2  A: 
  1. Get the sequence that is the NOT of each row.
  2. Determine all of the sub-sequences of those sequences.
  3. Your answer is all of the sequences that are left.

In essence, you're taking the union of (sequences that don't provide X) and (sequences that don't provide Y) and (sequences that don't provide Z), and taking the complement of that set.

I think this generalizes.

John at CashCommons
And if the arbitrary matrix has more zeros than ones, this is worse than the positive version - if it only has ones on the lead diagonal, you are calculating the union of N2<sup>(N-1)</sup> sets which don't match to find the one set which does. This is an optimisation to apply to some matrices; there is no general approach which is better than 2<sup>N</sup>
Pete Kirkham
Good points. Thank you!
John at CashCommons
A: 

One possible algorithm that runs in O(2^n) time would be to try every possibility recursively.

So in pseudo code:

validSequence(vector<columns> cols, vector<columns> sequence)
    // base case for recursion
    if (cols.empty)
        if (validSequence(sequence))
            printSequence(sequence)
        else
            // invalid sequence
    // now we recurse using the next row and not using the next row
    column nextRow = pop(cols)
    validSequence(cols, sequence)
    sequence.push(nextRow)
    validSequence(cols, sequence)

Obviously you still need to implement the validSequence(sequence) function, but I'm pretty sure this would work

Hope this helps you

Matt
Thanks very much!
jsinnott
+2  A: 

You can determine the set that fulfills X (a,ab,ac,abc,bc,c), intersect it with the set that fulfills Y (a,ab,ac,abc,b,bc,c), and intersect the result with the set that fullfills Z (ab, ac, abc, b, bc, c) etc.

For bigger Matrices you can use a divide and conquer schema from this operation: Take half of the conditions and determine their sets, take the other half of the conditions and determine their sets, and intersect both sets. Repeat for both parts...

Landei
+2  A: 

Basically you're solving X & Y & Z where X = a | c, Y = a | b | c, Z = b | c for a, b and c.

So basically you're looking for all solutions of SAT on a formula in conjunctive normal form. Although some simplifications exist, basically any approach will be O(2N). The algorithms for efficiently finding a solution of this class of problem are listed in the wikipedia article; however if you want all solutions you will be iterating over 2N possibilities anyway, so simply enumerating them and testing will suffice.

A few trivial optimisations:

  • there are no negations in the formula, so if you have a solution, then adding any other variable to that solution will also be a solution. Your OP says you want all solutions; for large matrices there will be many solutions based on the same necessary basis and all selections of unnecessary values.
  • if you and all the values in a column and get 1, then that column is a basis for a solution (that column and any selection of other columns is a solution)
  • if you or all the values in a column and get 0, then that column does not contribute to the basis of any solution
  • approaches based on combining sets to find bases (only add a column to the set if it selects a row which is not selected) might be better than approaches finding all solutions
  • if you go with the approach of combining sets which do/do solve the problem - the union of non-solutions or the intersection of solutions - it may be better to divide the rows into two, and use positive approach for rows with few 1s, and the negative approach for rows with few 0s.

Since it is a 2N problem, it is reasonable to assume that N<64, and so you can make some implementation optimisations, but you won't reduce the size of the problem.

Pete Kirkham
Hey Pete-- Thanks *very* much. Your post was very helpful.
jsinnott
A: 

represent your matrix as a graph and then you can then find all the clusters. I'm pretty sure this generalizes (in graphviz notation) since you can just treat your matrix as an adjacency matrix and use

graph G {

A -- X;
A -- Y;
B -- X;
B -- Y;
C -- X;
C -- Y;
C -- Z;

}

ABC can reach XYZ BC can reach XYZ AC can reach XYZ AB can reach XYZ C can reach XYZ

A can NOT reach XYZ B can NOT reach XYZ

http://rollerjm.free.fr/pro/graphs.html

Joshua Smith