views:

200

answers:

1

Please help me out with an algorithm for the following problem -

Given a collection of facts, we would like to get rid of as much redundancy as possible. The facts involved in this problem are members of a transitive relation between uppercase letters. So each fact is a pair of uppercase letters such as AB meaning that A is related to B. A letter may or may not be related to itself, but transitivity holds: if A is related to B and B is related to C then we can infer that A is related to C. Create a class FactCount that contains a method minFacts that is given a String[] known and that returns the size of the smallest set of facts that will allow us to infer everything (and only those things) that can be inferred from the facts contained in known.

Each element of known will contain 1 or more facts separated by a single space. The smallest set of facts may contain facts that can be inferred from known but that are not contained in it.

For example:

{"AB AC CA AA BC", "AD"}

Returns: 4

AB, CA, BC, and AD allow us to infer both AA (AB, BC, CA gives AA by transitivity) and AC (AB, BC gives AC by transitivity), and there is no smaller subset that allows us to infer all the known facts.

P.S - Its NOT homework. Just a problem I found online and have been unable to solve for hours...

+5  A: 

It looks to me you are looking for a spanning tree of your graph.

A spanning tree of a connected graph G can also be defined as a maximal set of edges of G that contains no cycle, or as a minimal set of edges that connect all vertices.

From the spanning tree, you have a minimal representation of the graph. Any other addition of edges will create a cycle, and therefore redundant information for the relationships among nodes.

Also please note that if, at the end of the algorithm, you have non-connected nodes remaining (meaning, there are nodes in your graph that your MST does not touch), it means that the spanning tree you obtained is an independent entity, and there's no relationship (edge) connecting the nodes of your spanning tree and the nodes not in it.

In more mathematical terms, the nodes belonging to your spanning tree and the nodes not belonging to it are disjoint sets, and none of them is the empty set.

You can of course perform the MST search again on the remaining subset to generate a spanning forest, until you exhaust your set of nodes.

Stefano Borini
Can you point to an algorithm for finding the minimum spanning tree for directed graphs? Cant find on on Google.
Pranav
If you work in python, there's NetworkX. It's a very complete library. http://networkx.lanl.gov/reference/generated/networkx.mst.html#networkx.mst
Stefano Borini
For the algorithm, just copy what they use ;)
Stefano Borini
Would like to implement it myself...
Pranav
Ten it's the Kruskal’s algorithm. It is made for weighted graphs, but you should have no problem. Just assign the same weight to every edge. Of course, the MST is not unique.
Stefano Borini
Kruskal's algo wont work for a directed graph. However found this link - algowiki.net/wiki/index.php/Edmonds's_algorithm
Pranav
If you have a connected graph you can also use Prim's http://en.wikipedia.org/wiki/Prim's_algorithm
Malaxeur