views:

180

answers:

3

There are three ways to store a graph in memory; - nodes as objects and edges as pointers - a matrix containing all edge weights between numbered node x and node y - a list of edges between numbered nodes

I know how to write all three, but I'm not sure I've thought of all of the advantages and disadvantages of each.

What are the advantages and disadvantages of each of these ways of storing a graph in memory?

+3  A: 

One way to analyze these is in terms of memory and time complexity (which depends on how you want to access the graph).

Storing nodes as objects with pointers to one another

  • The memory complexity for this approach is O(n) because you have as many objects as you have nodes. The number of pointers (to nodes) required is up to O(n^2) as each node object may contain pointers for up to n nodes.
  • The time complexity for this data structure is O(n) for accessing any given node.

Storing a matrix of edge weights

  • This would be a memory complexity of O(n^2) for the matrix.
  • The advantage with this data structure is that the time complexity to access any given node is O(1).

Depending on what algorithm you run on the graph and how many nodes there are, you'll have to choose a suitable representation.

f64 rainbow
A: 

I think your first example is a little ambiguous -- nodes as objects and edges as pointers. You could keep track of these by storing only a pointer to some root node, in which case accessing a given node may be inefficient (say you want node 4 -- if the node object isn't provided, you may have to search for it). In this case, you'd also lose portions of the graph that aren't reachable from the root node. I think this is the case f64 rainbow is assuming when he says the time complexity for accessing a given node is O(n).

Otherwise, you could also keep an array (or hashmap) full of pointers to each node. This allows O(1) access to a given node, but increases memory usage a bit. If n is the number of nodes and e is the number of edges, the space complexity of this approach would be O(n + e).

The space complexity for the matrix approach would be along the lines of O(n^2) (assuming edges are unidirectional). If your graph is sparse, you will have a lot of empty cells in your matrix. But if your graph is fully connected (e = n^2), this compares favorably with the first approach. As RG says, you may also have fewer cache misses with this approach if you allocate the matrix as one chunk of memory, which could make following a lot of edges around the graph faster.

The third approach is probably the most space efficient for most cases -- O(e) -- but would make finding all the edges of a given node an O(e) chore. I can't think of a case where this would be very useful.

ajduff574
A: 

Okay, so if edges don't have weights, the matrix can be a binary array, and using binary operators can make things go really, really fast in that case.

If the graph is sparse, the object/pointer method seems a lot more efficient. Holding the object/pointers in a data structure specifically to coax them into a single chunk of memory might also be a good plan, or any other method of getting them to stay together.

The adjacency list - simply a list of connected nodes - seems by far the most memory efficient, but probably also the slowest.

Reversing a directed graph is easy with the matrix representation, and easy with the adjacency list, but not so great with the object/pointer representation.

Dean J