I'm looking through possible representations for what can be considered a finite depth graph in XML format for data exchange purposes. The problematic point is how to reference nodes in edge tags. Two strategies I see are a) using unique identifiers or b) using paths.
Unique IDs:
<graph id="g0">
<node id="n0"/>
<node id="n1"/>
<edge from="n1" to="n0"/>
</graph>
<graph id="g1">
<node id="n2"/>
</graph>
<edge from="n2" to="n1"/>
Paths:
<graph id="0">
<node id="0"/>
<node id="1"/>
<node id="2"/>
<edge from="1" to="0"/>
<edge from="2" to="1"/>
</graph>
<graph id="1">
<node id="0"/>
</graph>
<edge from="1:0" to="0:2"/>
What is the standard procedure for these kinds of things? From what I've gathered, unique identifier approach seems to be more widespread. My issue with that is when graphs are becoming very large, there's:
- necessity for a really large hash table that maps objects to their IDs for purposes of reading/writing edges from/to XML files
- the file itself is larger than the one written using paths because you cannot omit redundant path components if edge is internal to the graph.
Thoughts?
Update 1:
Note that its not one flat graph; its one or more graphs interconnected. They each have locally indexed elements, but flattening them all out and keeping track of edges across them is a bit of a nuisance.
Update 1.1: Noticed that with sub-graphs in GraphML, they do in fact use complex keys that makes it possible to separate local node id out of the global one.
Update 2:
Yes, obviously this is not a well formed XML and missing tags and all sorts of schema declarations.