views:

1033

answers:

3

I know all details about how entity groups work in GAE's storage, but yesterday (at the App Engine meetup in Palo Alto), as a presenter was explaining his use of entity groups, it struck me that I've never really made use of them in my own GAE apps, and I don't recall seeing them used in open-source GAE apps I've used.

So, I suspect I've just been overlooking (not noticing or remembering) such examples because I'm simply not used to them enough to immediately connect "use of entity group" to "kind of application problems being solved" -- and I think I should remedy that by studying such sources with this goal in mind, focusing on what problem the EG use is solving (i.e., why the app works with it, but wouldn't work or wouldn't work well without it).

Can anybody suggest good URLs to such code? (Essays would also be welcome, if they focus on application-level problem solving, but not if, like most I've seen, they just focus on the details of how EGs work!-).

+16  A: 

The main use of entity groups is to provide the means to update more than one entity in a transaction.

If you haven't had to use them, count your blessings. Either you have been designing your data models such that no two entities ever need to be updated at the same time in order to remain consistent, or else you do need them but you've gotten lucky :)

Imagine that I have an Invoice entity type, and a LineItem entity type. One Invoice can have multiple LineItems associated with it. My Invoice entity has a field called LastUpdated. Any time a LineItem gets added to my Invoice, I want to store the current date in the LastUpdated field.

My update function might look like this (pseudocode)

invoice.lastUpdated = now()
lineitem = new lineitem()

invoice.put()
lineitem.put()

What happens if the invoice put() succeeds and the lineitem put() fails? My invoice date will show that something was updated, but the actual update (the new LineItem) wouldn't be there. The solution is to put both puts() inside a transaction.

An alternative solution would be to use a query to find the date of the last inserted LineItem, instead of storing this data in the lastUpdated field. But that would involve fetching both the Invoice and all the LineItems every time you wanted to know the last time a lineitem was added, costing you precious datastore quota.

EDIT TO RESPOND TO POSTER's COMMENTS

Ah. I think I understand your confusion. The above paragraphs establish why transactions are important. But you say you still don't care about Entity groups, because you don't see how they relate to transactions. But if you are using db.run-in-transaction, then you are using entity groups, perhaps without realizing it! Every transaction involves one and only one entity group, and any given transaction can only affect entities belonging to the same group. see here

"All datastore operations in a transaction must operate on entities in the same entity group".

What kind of stuff are you doing in your transactions? There are plenty of good reasons to use transactions with just one Entity, which by default is in its own Entity Group. But sometimes you need to keep 2 or more entities in sync, like in my example above. If the Invoice and the LineItem Entities are not in the same entity group, then you could not wrap the modifications to them in a db.run-in-transaction call. So anytime you want to operate on 2 or more entities transactionally you need to first make sure they are in the same group. Hope that makes it more clear why they are useful.

Peter Recore
I've used db.run_in_transaction (see http://code.google.com/appengine/docs/python/datastore/functions.html) for transactions (or of course get_or_insert for that one special case, see also http://code.google.com/appengine/docs/python/datastore/transactions.html) -- what's the advantage of entity groups wrt that?
Alex Martelli
ouch! my first downvote. the advantage of entity groups wrt to transactions is that transactions don't work outside of entity groups. if you have been using transactions, then you have been using entity groups, even if only implicitly. every entity is in a group by default. transactions are limited in scope to affecting one entity group at a time. i have edited my answer with a longer explanation.
Peter Recore
Why is this downvoted? It's a good answer.
Nick Johnson
@Nick and @Peter, I agree, the downvote's one of the many inexplicable and mysterious ones that ones sees on SO.
Alex Martelli
Great answer! --- What is your definition of `entity`? --- Is it a HTML `code snippet (ie: "®" for "Registered Trademark") which is interpreted by web browsers to display special characters.` More at http://en.wikipedia.org/wiki/Entity
Masi
Entities are the basic unit of storage in the appengine datastore.
Peter Recore
+1  A: 

I've used them here. I'm setting my customer object as the parent of the map markers. This creates an entity group for each customer and gives me two advantages:

  1. Getting the markers of a customer is much faster, because they're stored physically with the customer object.(On the same server, probably on the same disk)

  2. I can change the markers for a customer in a transaction. I suspect the reason transactions require all objects that they operate on to be in the same group is because they're stored in the same physical location, which makes it easier to implement a lock on the data.

Sudhir Jonathan
A: 

I've used them here in this simple wiki system. The latest version of a page is always a root entity and past versions have the latest version as ancestor. The copy operation is done in a transaction to keep the version consistency and avoid losing a version in case of concurrency.

moraes