views:

69

answers:

1

Consider a GAE (python) app that lets users comment on songs. The expected number of users is 1,000,000+. The expected number of songs is 5,000.

The app must be able to:

  • Give the number of songs a user has commented on
  • Give the number of users who have commented on a song

Counter management must be transactional so that they always reflect the underlying data.

It seems GAE apps must keep these types of counts calculated at all times since querying for them at request time would be inefficient.

My Data Model

class Song(BaseModel):
    name = db.StringProperty()
    # Number of users commenting on the song
    user_count = db.IntegerProperty('user count', default=0, required=True)
    date_added = db.DateTimeProperty('date added', False, True)
    date_updated = db.DateTimeProperty('date updated', True, False)

class User(BaseModel):
    email = db.StringProperty()
    # Number of songs commented on by the user
    song_count = db.IntegerProperty('song count', default=0, required=True)
    date_added = db.DateTimeProperty('date added', False, True)
    date_updated = db.DateTimeProperty('date updated', True, False)

class SongUser(BaseModel):
    # Will be child of User
    song = db.ReferenceProperty(Song, required=True, collection_name='songs')
    comment = db.StringProperty('comment', required=True)
    date_added = db.DateTimeProperty('date added', False, True)
    date_updated = db.DateTimeProperty('date updated', True, False)

Code
This handles the user's song count transactionally but not the song's user count.

s = Song(name='Hey Jude')
s.put()

u = User(email='[email protected]')
u.put()

def add_mapping(song_key, song_comment, user_key):
    u = User.get(user_key)

    su = SongUser(parent=u, song=song_key, song_comment=song_comment, user=u);
    u.song_count += 1

    u.put()
    su.put()

# Transactionally add mapping and increase user's song count
db.run_in_transaction(add_mapping, s.key(), 'Awesome', u.key())

# Increase song's user count (non-transactional)
s.user_count += 1
s.put()

The question is: How can I manage both counters transactionally?

Based on my understanding this would be impossible since User, Song, and SongUser would have to be a part of the same entity group. They can't be in one entity group because then all my data would be in one group and it could not be distributed by user.

+1  A: 

You really shouldn't have to worry about handling the user's count of songs on which they have commented inside a transaction because it seems unlikely that a User would be able to comment on more than one song at a time, right?

Now, it is definitely the case that many users could be commenting on the same song at one time, so that is where you have to worry about making sure that the data isn't made invalid by a race condition.

However, if you keep the count of the number of users who have commented on a song inside the Song entity, and lock the entity with a transaction, you are going to get very high contention for that entity and datastore timeouts will make you application have lots of problems.

This answer for this problem is Sharded Counters.

In order to make sure that you can create a new SongUser entity and update the related Song's sharded counter, you should consider having the SongUser entity have the related Song as a parent. That will put them in the same entity group and you can both create the SongUser and updated the sharded counter in the same transaction. The SongUser's relationship to the User who created it can be held in a ReferenceProperty.

Regarding your concern about the two updates (the transactional one and the User update) not both succeeding, that is always a possibility, but given that either update can fail, you will need to have proper exception-handling to ensure that both succeed. That's an important point: the in-transaction-updates are not guaranteed to succeed. You may get a TransactionfailedError exception if the transaction can not complete for any reason.

So, if your transaction completes without raising an exception, run the update to User in a transaction. That will get you automatic retries of the update to User, should some error occur. Unless there's something about possible contention on the User entity that I don't understand, the possiblity that it will not eventually succeed is surpassingly small. If that is an unacceptable risk, then I don't think that that AppEngine has a perfect solution to this problem for you.

First ask yourself: is it really that bad if the count of songs that someone has commented on is off by one? Is this as critical as updating a bank account balance or completing a stock sale?

Adam Crossland
Your solution reduces contention, but what I am really trying to do is ensure that both counters match the underlying `SongUser` records. If I use sharded counters for the `Song` entities, I can still have the case when creating a `SongUser` succeeds and incrementing the song's counter fails (or vice versa).
cope360
Noted. I updated my answer to reflect your concern.
Adam Crossland
I think the solution in your last paragraph is probably the best option within the GAE limits. In that solution we flip the example from my first comment. It would now be possible, for example, for the Song counter and SongUser records to be updated/created but for the User record update to fail (or vice versa). Do you agree it is impossible to update both counters (sharded or not) transactionally?
cope360
Updated my answer based on your comment
Adam Crossland
I guess it would have been more fair not to ask this question about something trivial like song comments. I agree that having the user's count perhaps be off by a few is not important. I'm really just trying to learn what I can and can't do with GAE. If I had asked the question about bank balances, then it would have been much easier to arrive at the "No" answer. Of course, there are a lot of other reasons not to use GAE for bank transactions ;)
cope360