ansaurus

Question

computing "node closure" of graph with removal

Answer 1

+2 A:

I'll try to characterize the problem in a different way. You want to separate the vertices into two groups. The first group consists of "root nodes", and the second of "dependent nodes", i.e. nodes directly connected to root nodes.

        root     dependent
                  B
           A  <
                  C

                  E
           D  <
                  F

^{Fig. 0: A possible result graph.}

You want to minimize the number of root nodes. This is the same as maximizing the number of dependent nodes, which is the same as maximizing the number of edges in the resulting graph, which is the same as minimizing the number of edges removed in constructing it from the start graph.

Let's first look at brute force: For each bipartition of the nodes into a root and a dependent set, first check if it fulfills the criteria of the problem statement, finally take the best possible. There are an exponential number of bipartitions, so this will have to be refined.

If we regard each edge as having the possible status "unknown", "take", or "remove", taking an edge removes all edges originating from its end node, as well as all edges ending there (see Fig. 1). However, we can retain at most one of the edges ending in a specific node, anyway.

    A B C
     \|/
      D
     /|\
    E F G

^{Fig 1: Taking the edge A-D removes all other edges diplayed here.}

There are a number of "greedy" heuristics:

Put the node with the most outgoing edges into the root set and all its connected nodes into the dependent set (I think that this was your proposal).

This has the problem that some of the connected nodes would be better put into the root set, which brings us to the first refinement:

Put the node with the most outgoing connections into the root set and mark all its connected nodes as "reachable". Then loop, but only count connections to nodes that are not "reachable" at each point. Stop when all nodes are in the root set or "reachable". Finally, put all "reachable" nodes that are not in the root set into the dependent set, and arbitrarily distribute them among the root nodes they can be connected to.

This looks quite OK, but still not necessarily optimal. Perhaps you can start from the other side:

Put the node with the least number of outgoing connections into the dependent set. "Remove" all its outgoing edges. "Take" its incoming edge that comes from a node with the highest number of outgoing connections, and put that node into the root set. Loop until all nodes are either in the root or dependent set.

I still can't prove whether this might be optimal, but at least I cannot directly think of a situation that breaks it.

Svante 2010-03-12 12:29:44

Answer 2

+1 A:

20! is rather large, larger than 2^61. Fortunately there's a better way to solve small instances: (EDIT) dynamic programming. By saving the optimal solutions to each subproblem, we pay some memory to obtain a very large savings in time.

Here's some sample code in Python. In implementing the code below in another language, you'll probably want to number the vertices 0, ..., n-1 and implement the sets as bit vectors.

# find a smallest node closure of G
# G is a graph in adjacency-list format: G[v] is the set of neighbors of v
def node_closure(G):
    # maps subsets of vertices to their smallest node closure
    smallest = {frozenset(): []}
    def find_smallest(S):
        if S in smallest:
            return smallest[S]
        else:
            candidates = [[v] + find_smallest(S - frozenset([v]) - G[v]) for v in S]
            return min(candidates, key=len)
    return find_smallest(frozenset(G))

This problem has an NP-hardness reduction from set cover that preserves the objective value. This means that unless P = NP, the best guarantee you can obtain for a polynomial-time algorithm is that it always outputs a solution that is at most O(log n) times larger than optimal.

If x1, ..., xm are the elements to be covered and S1, ..., Sn are the sets, then the goal of set cover is to choose the minimum number of sets whose union is {x1, ..., xm}. Assuming that each element appears in at least one set, make a graph with nodes x1, ..., xm, S1, ..., Sn, R where there are arcs from R to all Si and for all i, j, an arc from Si to xj if and only if xj belongs to Si. There's a straightforward correspondence between node closures and set covers: to obtain a node closure from a set cover, remove the vertices corresponding to the sets chosen and then R; to obtain a set cover from a node closure, take all sets whose vertices were chosen plus sets containing each xj whose vertex was chosen.

(Note, for set cover, the greedy algorithm achieves the optimal approximation ratio! Something similar might be true for your problem.)

2010-03-12 14:10:57

Thanks algorithmist! Yes - it doesn't even look like NP-complete to me [Given an answer, how are we going to verify it is optimal solution in polynomial time?]. It definitely looks outside NP to me.I am fine with a non polynomial algorithm, if that's the best we can do, my graph is pretty small[#(V) = 20]. For example generating all permutations is exponential time, but we use it all the time for small sequences. In fact brute force solution is to generate permuataions of V and for each permutation form super nodes in the order [of that permutation] and pick minimum.

Fakrudeen 2010-03-13 06:36:21

Yes - I forgot the power of exponential - 20! operations in a Ghz processor will take 71 years! Currently I am only working with #(V)=15 [~20 minutes] and I added 5 more just to be safe! Dynamic programming looks like the way to go!

Fakrudeen 2010-03-15 06:38:02

ansaurus

tags:

views:

answers:

computing "node closure" of graph with removal

related questions