views:

30

answers:

2

I've been using the low level datastore API for App Engine in Java for a while now and I'm trying to figure out the best way to handle one to many relationships. Imagine a one to many relationship like "Any one student can have zero or more computers, but every computer is owned by exactly one student".

The two options are to:

  • have the student entity store a list of Keys of the computers associated with the student
  • have the computer entity store a single Key of the student who owns the computer

I have a feeling option two is better but I am curious what other people think.

The advantage of option one is that you can get all the 'manys' back without using a Query. One can ask the datastore for all entities using get() and passing in the stored list of keys. The problem with this approach is that you cannot have the datastore do any sorting of the values that get returned from get(). You must do the sorting yourself. Plus, you have to manage a list rather than a single Key.

Option two seems nice because there is no list to maintain. Also, you can sort by properties of the computer as long as their is an index for that property. Imagine trying to get all the computers for a student where the results are sorted by purchase date. With approach two it is a simple query, no sorting is done in our code (the datastore's index takes care of it)

Sorting is not really hard, but a little more time consuming (~O(nlogn) for a sort) than having a sorted index (~O(n) for going through the index). The tradeoff is an index (space in the datastore) for processing time. As I said my instinct tells me option two is a better general solution because it gives the developer a little more flexibility in getting results back in order at the cost of additional indexes (which with the google pricing model are pretty cheap). Does anyone agree, disagree, or have comments?

A: 

Have you considered doing both? Then you could quickly get a list of computers a student owns by key OR use a query which returns results in some sorted order. I don't think maintaining a list of keys on the student model is as intimidating as you think.

Don't underestimate the benefit of fetching entities directly by keys. According to this article, this can be 4-5x faster than queries.

David Underhill
"Don't underestimate the benefit of fetching entities directly by keys. According to this article, this can be 4-5x faster than queries." The article references the System Status page (http://code.google.com/status/appengine/) when it mentions that number. But isn't that comparison a bit unfair. I imagine that gets will typically be used to return a single entry, whereas queries would normally return more data - sometimes much more.
hwiechers
+1  A: 

Both approaches are valid in different situations, though option two - storing a single reference on the 'many' side - is the more common approach. Which you use depends on how you need to access your data.

Nick Johnson