views:

280

answers:

3

Hi, I was wondering what would be the best way to implement undirected graphs (and hence graph traversal) on Google App Engine. I'm currently storing edges in the database as a list, i.e.

class Relation(db.Model):
    connect = db.ListProperty(str, required=True)

but this is notoriously inefficient.

I'm aware of the directed graph question as posed here, but I find that it would not be so suitable for an undirected graph.

A: 

You could store the vertex' ends as a list of keys, but you would need to update two vertex to create a new connection.

class Vertex(db.Model):
    ends= db.ListProperty(db.Key)
    # Other information about the vertex here

If you are not worried about the writing time, this may be a good solution.

jbochi
+1  A: 

I would store the graph as a directed graph, which allows you to use queries more effectively. Obviously you need to have the constraint that all directed edges must have a partnering edge going in the opposite direction.

Class Vertex(db.Model):
   #Vertex specific stuff

Class Edge(db.Model):
   Start = db.ReferenceProperty(Vertex)
   End = db.ReferenceProperty(Vertex)

You can then pull out all of the edges relating to a specific vertex with a simple query:

#getting all neighbours of a vertex called "foo"
Neighbours = Edge.all()
Neighbours.filter("Start = ", foo)
#neighbours now contains a set of all edges leading from foo

Nice and simple, taking advantage of the fact you're using appengine so you can let indexing do a lot of the work for you ;)

If you want to make sure that directed constraint remains true, obviously use a method to create edges like so:

LinkVertices(A, B) #A and B are vertices
   edgeOne = Edge()
   edgeOne.Start = A
   edgeOne.End = B
   edgeOne.put()

   edgeTwo = Edge()
   edgeTwo.Start = B
   edgeTwo.End = A
   edgeTwo.put()

Addressing roffles concerns about storing all the edges twice, you could try something like this:

Class Edge(db.Model):
    A = db.ReferenceProperty(Vertex)
    B = db.ReferenceProperty(Vertex)

#getting all neighbours of a vertex called "foo"
Neighbours = Edge.all()
Neighbours.filter("A = ", foo)
OtherNeighbours = Edge.all()
OtherNeighbours.filter("B = ", foo)
#these two queries now contains all edges leading from foo.

This approach basically saves you storage space (every edge is only stored once) at the cost of slightly higher query times and much messier code. In my opinion that's not a very good tradeoff, since you're saving yourself about 64 bytes storage per edge.

Martin
Oh I like this, though I really would like an undirected graph implementation :)
rfw
Never mind that comment above, I wasn't thinking properly ;)
rfw
you can delete comments for when you have a mild brain explosion ;)Does this satisfy your requirements then?
Martin
I'm concerned about the fact I'll need two edges per link though - 500 undirected edges gets turned into 1000 directed ones. That can't be good for App Engine.
rfw
Well the edges are fairly lightweight in terms of storage space. I wouldn't worry about it - it's CPU time you should be more worried about in my opinion.
Martin
I updated my answer with an alternative solution
Martin
A: 

Forgive me if this misses something about the problem, but why can't you have an edge class that holds max two vertex refs in a list? This would let you use an equality query to fetch all edges for a given vertex and it doesn't require duplication.

Class Edge(db.Model):
   Vertices = db.ListProperty(db.ReferenceProperty(Vertex))

...

edges = Edge.all()
edges.filter("Vertices = ", foo)
Richard Watson