views:

131

answers:

4

I'm looking for a way of storing graphs as strings. The strings are to be used as keys in a map, so that two topologically identical graphs will map to the same value in the map. Does anybody know of such an algorithm? The nodes of the tree are labeled with duplicate labels being allowed.

The program is in java and an implementation in that would be neat, but any pointers to possible algorithms are appreciated.

+1  A: 

A common way to do this is using Adjacency lists

stacker
The problem is that the labels on the nodes are not unique. One graph might have several nodes labelled "a" and several labeled "b" in which case an adjacency list would be something likea->b,b;a->c;b->a;b->c;
Thomas
How about adding a unqiue id (serial by incrementing an int a1->b1,a2->c2 etc.)? Otherwise you can't have an inverse funtion http://en.wikipedia.org/wiki/Bijection
stacker
A: 

Beside an Adjacency list, there are adjacency matrices. Which one you choose should depend on which you use to implement your Graph class (adjacency lists are usually the better choice, but they both have strengths and weaknesses). If you have a totally different implementation of Graph, consider using one of these, as it makes many graph algorithms very easy to implement.

One other option is, if possible, overriding hashCode() and equals() on the Graph class and use the actual graph object as the key rather than converting to a string.

E: overriding the hashCode() and equals() is the route I would take if some vertices are not uniquely labeled. As noted in the comments, this can be expensive, but I think it would depend on the implementation of the Graph class.

If equals() is too expensive, then you should use an adjacency list or matrix, but don't just use the node names. You have to carefully specify exactly what it is that identifies individual graphs and vertices (and therefore what would make them equal), and then make your string representation of the adjacency list use those properties instead of the node names. I'd suggest you write this specification of your graph equals operation down.

Sean Nyman
equals() would be very expensive to perform as I would have to run a topology matching algorithm for every time it is called. I'd rather pay up front and allow for very fast insertions into the map.
Thomas
+2  A: 

You may find the following question relevant...

Basically, an automaton can be minimised using well-known algorithms from automata-theory textbooks. Hopcrofts is an example. There is precisely one minimal automaton that is equivalent to any given automaton. However, that minimal automaton may be represented in different ways. Constructing a safe canonical form is basically a matter of renumbering nodes and ordering the adjacency table using information that is significant in defining the automaton, and not by information that is specific to the representation.

The basic principle should extend to general graphs. Whether you can minimise your graphs depends on their semantics, but the basic idea of renumbering the nodes and sorting the adjacency list still applies.

Other answers here assume things about your graphs - for example that the nodes have unique labels that can be ordered and which are significant for the semantics of your graphs, that can be used to identify the nodes in an adjacency matrix or list. This simply won't work if you're interested in morphims of unlabelled graphs, for instance. Different ways of numbering the nodes (and thus ordering the adjacency list) will result in different canonical forms for equivalent graphs that just happen to be represented differently.

As for the renumbering etc, an approach is to borrow and adapt principles from automata minimisation algorithms. Basically...

  • Create a vector of blocks (sets of nodes). Initially, populate this with one block per class of nodes (ie per distinct node annotation). The modification here is that we order these by annotation details (not by representation-specific node IDs).

  • For each class (annotation) of edges in order, evaluate each block. If each node in the block can follow the current edge-type to reach the same set of next blocks, leave it untouched. Otherwise, split it as necessary to get maximal blocks that achieve this objective. Keep these split blocks clustered together in the vector (preserve the existing ordering, just refine it a bit), and order the split blocks based on a suitable ordering of the next-block sets. For example, use bitvectors as long as the current vector of blocks, with a set bit for each block reachable by following the current edge type. To order the bitvectors, treat them as big integers.

EDIT - I forgot to mention - in the second bullet, as soon as you split a block, you restart with the first block in the vector and first edge annotation. Obviously, a naive implementation will be slow, so take the principle and use it to adapt Hopcrofts minimisation algorithm.

If you end up with blocks that have multiple nodes in them, those nodes are equivalent. Whether that means they can be merged or not depends on your semantics, but the relative ordering of nodes within each such block clearly doesn't matter.

If dealing with graphs that can be minimised (e.g. automaton digraphs) I suspect it's best to minimise first, though I still haven't got around to implementing this myself.

The key thing is, of course, ensuring that your renumbering is sensitive only to the significant details of the graph - its structure and annotations - and not the things that are only there so that you can construct a representation such as node IDs/addresses etc.

Once you have the blocks ordered, deriving a canonical form should be easy.

Steve314
Ahh yeah, I hadn't thought about using automaton methods. You're right, there might be something there. I'll have to dig trough my old books then.
Thomas
+1  A: 

Hi,

if you have an algorithm that maps general graphs to strings, and so that two graphs map to the same string if and only if they are topologically equivalent, then you have an algorithm for GRAPH AUTOMORPHISM. Graph automorphism has no known polynomial-time algorithms. So you can't have (easily :) a polynomial-time algorithm that calculates the strings as you postulate them, because otherwise you'd have constructed a previously unknown and very efficient algorithm to graph automorphism.

This doesn't mean that it wouldn't be possible to solve the problem for your class of graphs; it just means that for the class of all graphs it's kind of difficult.

antti.huima