views:

144

answers:

7

I have a fairly complex community site (in php/mysql) with a number of "things" that the registered users can "watch".

"Things" are: Posts, comments, event updates, user status changes, etc. At least 10 different types of "things" are available and a user can watch any other user's "things" (Think facebook)

"Watch" means: A user can watch any "thing" and see a feed of all updates / additions to each "thing" across all types of "things".

I have built a similar app with far fewer "things" that can be watched and my setup was basically this:

'watches' table fields: id, userId, itemId [id in foreign table], itemClass [reference id of foreign table]

A user could add a 'watch', a record was created in the 'watches' table and then we'd poll that table (and cache it) so as they browsed we could note (on the UI) if they were already watching a particular item or not, PLUS they could see and manage a list of everything in the system that they were watching. Pretty simple.

So, (after that long-winded intro), this community site we're building is going to need to scale more dramatically. Thousands of users will have hundreds of "things" being watched and we will have to poll not only the 'watches' table per se, but each foreign, join table to get the details to fill each user's watch feed.

If you're confused about what I mean, just think of the facebook friend feed that you have on your facebook home page. How does that get updated without running tons of queries?

A: 

When someone makes an update, it could notify all "watchers" that something has changed, giving it a reference to look up. So when I log in, the system queries my table and finds that some "thing" made an update. It then fetches that update and shows it to me on the page. It's something like a publish/subscribe model.

EDIT (to include my comment):

Each user would have a row in a watch_table associated with them. So I attach myself as a watcher to my girlfriend's status updates, stored in her user record. Then when she updates, iterate through her list of watchers, placing a record in each watcher's record in the watch_table. So when I log in, my watch_table record is queried, and I see in the UI that her status was updated.

The above is using some semantics from facebook status updates.

geowa4
This was the approach that seems most appealing initially, but how do you store that data? We're not "notifying" via email, we're storing the watch activity. If I have 1000 users watching the same thing (likely), then I have to write 1000 rows every time an update is made.
phirschybar
Each user would have a watch_table associated with them. So I attach myself as a watcher to my girlfriend's status updates, stored in her user table. Then when she updates, iterate through her list of watchers, placing a record in each watcher's watch_table. So when I log in, my watch_table is queried, and I see in the UI that her status was updated.
geowa4
I threw that comment in as an edit.
geowa4
@geowa4 - I don't like the idea of a new table for every user. Would be hell to maintain.
phirschybar
sorry, my semantics are a bit off. i meant row... will update
geowa4
+2  A: 

The other approach from your polling would be to set up your system so that "watch updates" were noted after the creation of an item.

The logic would be:

  1. Post is created
  2. As final step in Post creation, a query is run against the watches table for people watching for new Posts
  3. As watches are found you write an entry to that users "activity" or watch feed.
Mike Buckbee
yes - same sort of concept as the answer from geowa4. I really like this. But I am worried about having massive writes to the DB every time an update is made.
phirschybar
flame me if this is sloppy, but what if I just wrote one record per update to the "activity" table and have a comma delimited list of user ids in a "watchers" field. Then I can use MySQL's FIND_IN_SET to get the related people when needed. Is this bad?
phirschybar
Won't you have massive writes either way: at "thing" creation or when you poll the items+watches? WRT your find_in_set solution, you're going to have issues either way, it's a pick your own poison kind of scenario.
Mike Buckbee
No, I would have:* One write for the "thing" creation* One or a couple reads to determine who is watching for the "thing" spawn just created.* Then one write in "activity" with a column called "watchers" where I would store (a potentially huge) delimited list of user ids.So the writes are not the issue here, it's the reading of the "activity" table and having to use FIND_IN_SET against large lists. Am I going about this wrong?
phirschybar
If you use the list of ID's then what happens if a user needs to unsubscribe from a user, then you will find every post from that user and remove the other user's ID out of every post they made?
jasondavis
yes and no. you would watch a user to be notified of any new posts, but you would watch an individual post to watch for new comments. So, it may not be too bad.
phirschybar
A: 

on every save, you'll need to determine if this is a "watched" item and record that in a new table. you'll need to save, what type of item it is, who is watching it, etc. when you need to display any watched items, you query this new table which will be fast.

KM
this seems to be the consensus. and I like it a lot. But the big question for me now is how to store "who is watching it". See my comment about FIND_IN_SET under Mike Buckbee's answer. Thoughts?
phirschybar
A: 

You could use dates to keep track of what a user has or has not seen. When a user logs in, he/she logs in at a certain date/time. When the User logs in again later, you could run a queries on all "things" being "watched" by that user that are newer than the last log-in date. This would be creating the reader list on the fly rather than on every "thing" update.

Robert DeBoer
A: 

In terms of patterns, one thing to keep in mind here is the classical "Listener" / "Event Notification" / "Observer" pattern. You want to loosely couple the folks who are generating the events, and the folks who are listening to them, so that you don't have to modify your event-generators after a new listener is added.

So indeed, firing a notification "I changed!" or "I'm a new post!" after a relevant modification of the observable object, and looking in a table that registered all listeners, seems like a good approach.

Alex
+1  A: 

the best thing to do in terms of scalability and ignoring normalization rules (which you sometimes need to do for performance reasons) would be to store the watches in a table like you already have, but then also add a TEXT column on to the content tables where you store in a delimited fashion all of the id's of the users watching that item.

when the item is updated, you have a handy pre-built list of people to notify for no additional overhead.

and when a user wants to see what they're watching, you have your efficient normalized tables.

vbulletin does this i think for keeping track of who has read a particular thread.

longneck
this is nice. hadn't thought of storing the ids with the actual content. really helpful.
phirschybar
A: 

Take a look at MSFTs SQL Notification Service. Though no longer supported in SQL Sver 2008, its documentation describes a robust notification engine. Something you may, or may not, want to model - depending on your needs.

RobertL