views:

255

answers:

2

I apologize if this has been answered elsewhere, but I have yet to find it using my limited algorithmic terminology. ;)

My situation is this - I have a variable number of data elements, each of which has been tested against each other data element to determine compatibility. The compatibilities are stored in the equivalent to a 2-dimensional array (truth table?). My goal is to generate all possible combinations of these data elements, where every element within the combination is compatible with each other element.

For example, if element 1 (of 4) was compatible with elements 2 and 4, and element 2 was compatible with 1, 3 and 4, element 3 was compatible with 2 and element 4 was compatible with 1 and 2, my truth table would look something like:

1) {1,1,0,1}
2) {1,1,1,1}
3) {0,1,1,0}
4) {1,1,0,1}

The combinations that I want from this would be:
1,2,4
1,2
1,4
1
2,3
2,4
2
3
4

My approach works well in many situations, but is sometimes bogged down badly when the number of elements exceeds 5000, depending on the data sets. My secondary challenge is to determine the pattern that brings execution time up from 5 seconds to 3 hours...

Just from looking at the boolean array, I just feel like there MUST be an easier solution out there - an algorithm named after somebody, maybe. As you might infer from the above, I don't necessarily know how to ask the question. ;)

Thanks for your time!

+1  A: 

You're basically trying to reduce an expression to disjunctive normal form. In general, this problem is NP-complete. Sorry. ^_^

Thom Smith
If this turns out to be the case, then it all boils down to my implementation. I just wanted to make sure I wasn't missing out on what might be an obvious solution. :)Thanks Thom!
Mike
+4  A: 

I think that what you have is an 'adjacency matrix' not a truth table and that you are looking for all the 'fully connected subgraphs' of the graph of which the adjacency matrix is a representation. Fully connected subgraphs are also known, if memory serves, as 'cliques'. I'm not terribly sure about what you are looking for; as one of the earlier respondents indicated there are some discrepancies between your words and your matrix.

Do some googling around on those terms; right here right now it's too late for me to dig this stuff out of either my head or my library.

Note that your graph is symmetric, that is if '1 is compatible with 2' then necessarily '2 is compatible with 1'. Now that's halved your data storage requirements (made them more complex, storing the upper or lower half of the matrix is often more mind-bending than the space it saves warrants). I think also that you should probably have 1s along the main diagonal, to express the idea that '1 is compatible with 1'. Eventually, I suspect, you'll have some elements which are only compatible with themselves.

Finding cliques in a graph is, sadly, NP, but for matrices of only 5000x5000 elements a brute-force naive algorithm shouldn't take too long in a compiled language.

Regards

Mark

High Performance Mark
Adjacency Matrix - a term I'll need to remember. ;)Thank you for providing some other terms as well - my background doesn't include much in the way of algorithmic training, but I can't remember enjoying any other project as much as this one - so much to learn! :)For data storage - I am actually using less than half of the NxN - my approach ensures that I'm only ever comparing lower indexes to higher indexes. (Although now I wonder if the math used to determine the actual index is adding to my performance woes...)I'm coding in Java - might compare to C++ later. :)Thanks Mark! :)
Mike
Neat interpretation – it's one thing to know that all NP-complete problems are isomorphic, but another thing to see it so well-demonstrated!
Thom Smith