views:

130

answers:

4

I am trying to design tables to buildout a follower relationship.

Say I have a stream of 140char records that have user, hashtag and other text.

Users follow other users, and can also follow hashtags.

I am outlining the way I've designed this below, but there are two limitaions in my design. I was wondering if others had smarter ways to accomplish the same goal.

The issues with this are

  1. The list of followers is copied in for each record
  2. If a new follower is added or one removed, 'all' the records have to be updated.

The code

class HashtagFollowers(db.Model):
    """
    This table contains the followers for each hashtag
    """
    hashtag = db.StringProperty()
    followers = db.StringListProperty()

class UserFollowers(db.Model):
    """
    This table contains the followers for each user
    """
    username = db.StringProperty()
    followers = db.StringListProperty()

class stream(db.Model):
    """
    This table contains the data stream
    """
    username = db.StringProperty()
    hashtag = db.StringProperty()
    text = db.TextProperty()

    def save(self):
        """
        On each save all the followers for each hashtag and user
        are added into a another table with this record as the parent
        """
        super(stream, self).save()
        hfs = HashtagFollowers.all().filter("hashtag =", self.hashtag).fetch(10)
        for hf in hfs:
            sh = streamHashtags(parent=self, followers=hf.followers)
            sh.save()
        ufs = UserFollowers.all().filter("username =", self.username).fetch(10)
        for uf in ufs:
            uh = streamUsers(parent=self, followers=uf.followers)
            uh.save()



class streamHashtags(db.Model):
    """
    The stream record is the parent of this record
    """
    followers = db.StringListProperty() 

class streamUsers(db.Model):
    """
    The stream record is the parent of this record
    """
    followers = db.StringListProperty()

Now, to get the stream of followed hastags 

    indexes = db.GqlQuery("""SELECT __key__ from streamHashtags where followers = 'myusername'""")
    keys = [k,parent() for k in indexes[offset:numresults]]
    return db.get(keys)

Is there a smarter way to do this?

A: 

You could use a reference property and then have a common table with the followers in it, which you reference to

baloo
A: 

I'm not sure how to do this in Google App-Engine, but one database schema I would consider would be:

Tables:
    User    -- a table of users with their attributes
    HashTag -- a table of HashTags with their attributes
    Follows -- a table that defines who follows whom

Columns in the Follows table:
    followed int,         -- the id of the followed entity (could be 
                             User or Hashtag)
    followed_is_user bit, -- whether the followed item is a User
    followed_is_tag bit,  -- whether the followed item is a HashTag
    follower int          -- the id of the follower (this can only be 
                             a User so you may want to make this a foreign 
                             key on the User table)

You could probably condense the two bit columns into one, but this would allow you to add other things that Users could follow in the future.

tgray
the appengine database implementation is quite unique and i'm looking for a specific response.
molicule
+3  A: 

The problem you want to solve is called the fan-out problem.

Brett Slatkin from the Google App Engine team gave a talk with a efficient/scalable solution to fan-out problem on the App Engine. You can find a video of the talk here:

http://code.google.com/events/io/2009/sessions/BuildingScalableComplexApps.html

Mtgred
A: 

Yes this is the fan-out problem as others have noted and Brett Slatkin's talk should be looked at by those interested.

However, I raised 2 specific limitations i.e.

  • The list of followers is copied in for each record

This as they say is not a bug but a feature. In fact it is in this way that fan-out on appengine scales.

  • If a new follower is added or one removed, 'all' the records have to be updated.

Either that OR do nothing so future records are not followed. In other words one does not just follow people's streams one follows people's stream at a given time. So if on day 2 you unfollow, your follower stream will still show records from the user that came in on day one, but not day two and onwards. [Note: This is different from how twitter does it]

molicule