views:

52

answers:

2

I have a simple domain model as follows

Driver - key(string), run-count, unique-track-count

Track - key(string), run-count, unique-driver-count, best-time

Run - key(?), driver-key, track-key, time, boolean-driver-update, boolean-track-updated

I need to be able to update a Run and a Driver in the same transaction; as well as a Run and a Track in the same transaction (obviously to make sure i don't update the statistics twice, or miss out on an increment counter)

Now I have tried assigning as run key, a key made up of driver-key/track-key/run-key(string)

This will let me update in one transaction the Run entity and the Driver entity.

But if I try updating the Run and Track entities together, it will complain that it cannot transact over multiple groups. It says that it has both the Driver and the Truck in the transaction and it can't operate on both...

tx.begin();

run = pmf.getObjectById(Run.class, runKey);
track = pmf.getObjectById(Track.class, trackKey);
//This is where it fails;

incrementCounters();
updateUpdatedFlags();
tx.commit();

Strangely enough when I do a similar thing to update Run and Driver it works fine.

Any suggestions on how else I can map my domain model to achieve the same functionality?

A: 

With Google App Engine, all of the datastore operations must be on entities in the same entity group. This is because your data is usually stored across multiple tables, and Google App Engine cannot do transactions across multiple tables.

Entities with owned one-to-one and one-to-many relationships are automatically in the same entity group. So if an entity contains a reference to another entity, or a collection of entities, you can read or write to both in the same transactions. For entities that don't have an owner relationship, you can create an entity with an explicit entity group parent.

You could put all of the objects in the same entity group, but you might get some contention if too many users are trying to modify objects in an entity group at the same time. If every object is in its own entity group, you can't do any meaningful transactions. You want to do something in between.

One solution is to have Track and Run in the same entity group. You could do this by having Track contain a List of Runs (if you do this, then Track might not need run-count, unique-driver-count and best-time; they could be computed when needed). If you do not want Track to have a List of Runs, you can use an unowned one-to-many relationship and specify the entity group parent of the Run be its Track (see "Creating Entities With Entity Groups" on this page). Either way, if a Run is in the same entity group as its track, you could do transactions that involve a Run and some/all of its Tracks.

For many large systems, instead of using transactions for consistency, changes are done by making operations that are idempotent. For instance, if Driver and Run were not in the same entity group, you could update the run-count for a Driver by first doing a query to get the count of all runs before some date in the past, then, in a transaction, update the Driver with the new count and the date when it was last computed.

Keep in mind when using dates that machines can have some kind of a clock drift, which is why I suggested using a date in the past.

NamshubWriter
Thanks for your feedback. I do understand that entities/group/transaction constraint.. I am not too keen of having the Track (and the driver) owning a full collection of all the runs... as i expect especially the track to have many many runs, and i will constantly need these count statistics whenever i load a track, but i don't want to load all the runs just to calculate statistics all the time.... surely there must be a simple way, even if i sacrifice some sontention which i could deal with.... how do i force "all of the objects to be in the same entity group"?
Patrick
Do the statistics absolutely have to be updated in real-time? If not, I can suggest ways you can update them offline. To see how to select the entity group parent for an entity, click on one of the links in my post and see "Creating Entities With Entity Groups"
NamshubWriter
Namshub, thanks again for your feedback. I understand i have to find a way in between. Stats have not to be absolutely in real time. But ideally. That is why what i thought i would do is to try to update them real-time, and if it fails, i will try later (as long as i can store the fact that it failed). The other complication i have is that Run essentially has two separate parents (Track and Driver)... this is actually the root cause of the complication. Cause I cannot make it have two parents from what i understand.
Patrick
An entity can only have one entity group parent. Sounds like idempotent operations would meet your needs.
NamshubWriter
Namshub, thanks again for your contribution. My proposed answer solves my problem without the need of idempotent operations. But this obviously is a simplistic view of my real model. So I will certainly consider time-based idempotent operations when my problem gets more complex. thanks again.
Patrick
A: 

I think I found a lateral but still clean solution which still makes sense in my domain model.

The domain model changes slightly as follows:

Driver - key(string-id), driver-stats - ex. id="Michael", runs=17

Track - key(string-id), track-stats - ex. id="Monza", bestTime=157

RunData - key(string-id), stat-data - ex. id="Michael-Monza-20101010", time=148

TrackRun - key(Track/string-id), track-stats-updated - ex. id="Monza/Michael-Monza-20101010", track-stats-updated=false

DriverRun - key(Driver/string-id), driver-stats-updated - ex. id="Michael/Michael-Monza-20101010", driver-stats-updated=true

I can now update atomically (i.e. precisely) the statistics of a Track with the statistics from a Run, immediately or in my own time. (And same with the Driver / Run statistics).

So basically I have to expand a little bit the way I model my problem, in a non-conventional relational way. What do you think?

Patrick
I personally don't like the idea of duplicating the data just to allow transactions. Ideally, your data model should map to your domain model. If you don't need the stats to be real-time, you can update them offline with idempotent operations.
NamshubWriter
Well it is the way you see the model i suppose... i can actually extract two Run Entity Types... i can have a TrackRun, and a DriverRun. These simply store the association and the "parentUpdated" flags. The RunData itself can then be stored in yet a separate entity, so there is no duplication. RunData is only ever created and never updated. These entities can never be created twice because their id is defined by the user. This will still makes it valid against the real business model. I do need the stats to be in as real-time as possible (but resilient to failure). updating my answer...
Patrick
What happens if the write to the TrackRun succeeds and the write to the DriverRun fails?
NamshubWriter
When a run is created by the user the first time, in this case a few retries may happen automatically. if still no luck, the user is notified, and he can try again as many times as he wishes. There is no risk that TrackRun (or a DriverRun or DriverData) is created twice because of its implied string-id (create-if-not-exists). This is the create.
Patrick
Then, the stat-update stage can happen anytime. The system will try immediately after the create. But if either update fails, the user is still informed of a success... and the stats will be updated still atomically at a later stage when system is less busy / up again.
Patrick
I implemented this solution and it is doing it's job. thanks NamshubWriter for contributing to the discussion
Patrick