views:

1409

answers:

3

I'm developing my own social network, and I haven't found on the web examples of implementation the stream of users' actions... For example, how to filter actions for each users? How to store the action events? Which data model and object model can I use for the actions stream and for the actions itselves?

+2  A: 

I think that an explanation on how notifications system works on large websites can be found in the stack overflow question how does social networking websites compute friends updates?, in the Jeremy Wall's answer. He suggests the use of Message Qeue and he indicates two open source softwares that implement it:

  1. RabbitMQ
  2. Apache QPid

See also the question What’s the best manner of implementing a social activity stream?

Nicolò Martini
+6  A: 

I use a plain old MySQL table for dealing with about 15 million activities.

It looks something like this:

id             
user_id       (int)
activity_type (tinyint)
source_id     (int)  
parent_id     (int)
parent_type   (tinyint)
time          (datetime but a smaller type like int would be better) 

activity_type tells me the type of activity, source_id tells me the record that the activity is related to. So if the activity type means "added favorite" then I know that the source_id refers to the ID of a favorite record.

The parent_id/parent_type are useful for my app - they tell me what the activity is related to. If a book was favorited, then parent_id/parent_type would tell me that the activity relates to a book (type) with a given primary key (id)

I index on (user_id, time) and query for activities that are user_id IN (...friends...) AND time > some-cutoff-point. Ditching the id and choosing a different clustered index might be a good idea - I haven't experimented with that.

Pretty basic stuff, but it works, it's simple, and it is easy to work with as your needs change. Also, if you aren't using MySQL you might be able to do better index-wise.


For faster access to the most recent activities, I've been experimenting with Redis. Redis stores all of its data in-memory, so you can't put all of your activities in there, but you could store enough for most of the commonly-hit screens on your site. The most recent 100 for each user or something like that. With Redis in the mix, it might work like this:

  • Create your MySQL activity record
  • For each friend of the user who created the activity, push the ID onto their activity list in Redis.
  • Trim each list to the last X items

Redis is fast and offers a way to pipeline commands across one connection - so pushing an activity out to 1000 friends takes milliseconds.

For a more detailed explanation of what I am talking about, see Redis' Twitter example: http://code.google.com/p/redis/wiki/TwitterAlikeExample

casey
+1 for Redis. v2 uses virtual memory so it should be possible to rely entirely on Redis
stagas
+1  A: 

This is my implementation of an activity stream, using mysql. There are three classes: Activity, ActivityFeed, Subscriber.

Activity represents an activity entry, and its table looks like this:

id
subject_id
object_id
type
verb
data
time

Subject_id is the id of the object performing the action, object_id the id of the object that receives the action. type and verb describes the action itself (for example, if a user add a comment to an article they would be "comment" and "created" respectively), data contains additional data in order to avoid joins (for example, it can contain the subject name and surname, the article title and url, the comment body etc.).

Each Activity belongs to one or more ActivityFeeds, and they are related by a table that looks like this:

feed_name
activity_id

In my application I have one feed for each User and one feed for each Item (usually blog articles), but they can be whatever you want.

A Subscriber is usually an user of your site, but it can also be any object in your object model (for example an article could be subscribed to the feed_action of his creator).

Every Subscriber belongs to one or more ActivityFeeds, and, like above, they are related by a link table of this kind:

feed_name
subscriber_id
reason

The reason field here explains why the subscriber has subscribed the feed. For example, if a user bookmark a blog post, the reason is 'bookmark'. This helps me later in filtering actions for notifications to the users.

To retrieve the activity for a subscriber, I do a simple join of the three tables. The join is fast because I select few activities thanks to a WHERE condition that looks like now - time > some hours. I avoid other joins thanks to data field in Activity table.

Further explanation on reason field. If, for example, I want to filter actions for email notifications to the user, and the user bookmarked a blog post (and so he subscribes to the post feed with the reason 'bookmark'), I don't want that the user receives email notifications about actions on that item, while if he comments the post (and so it subscribes to the post feed with reason 'comment') I want he is notified when other users add comments to the same post. The reason field helps me in this discrimination (I implemented it through an ActivityFilter class), together with the notifications preferences of the user.

Nicolò Martini