views:

101

answers:

1

Given:

  • a directed Graph
  • Nodes have labels
  • the same label can appear more than once
  • edges don't have labels

I want to find the set of largest (connected) subgraphs which are equal taking the labels of the nodes into account.

The graph could be huge (millions of nodes) does anyone know an efficient solution for this?

I'm looking for algorithm and ideally a Java implementation.

Update: Since this problem is most likely NP-complete. I would also be interested in an algorithm that produces an approximated solution.

This seems to be close at least: Frequent Subgraphs

+5  A: 

I strongly suspect that's NP-hard.

Even if all the labels are the same that's at least as hard as graph isomorphism. (Join the two graphs together as a single disconnected graph; are the largest equal subgraphs the two original graphs?)

If identical labels are relatively rare it might be tractable.

Captain Segfault
+1, good analysis.
j_random_hacker
good point, didn't see the relation to graph isomorphism. But yes the number of identical labels should be relatively small.
kohlerm