views:

558

answers:

7

Have you ever noticed how facebook says “3 friends and 33 others liked this”? I was wondering what the best approach to do this is. I don’t think going through the friends list, and the list of users who “liked this” and comparing them is efficient at all! Do they keep a track of this in the database? That will make the database size very huge. What do you guys think?

Thanks!

+8  A: 

I would guess they outer join their friends table with their likes table to count both regular likes and friend likes at the same time.

With the proper indexes, it wouldn't be a slow query at all. Huge databases aren't necessarily slow, so there's really no reason to not store all of this information in a database. The trick is to make sure the indexes and partitions (if any) are set up well.

Welbog
how about creating "stats" table which contains [UserId, LikedItemId, NumberOffriendsLiked]. But honestly, I dont feel good about this cause this will generate a massive number of rows in this table!
OneDeveloper
+4  A: 

Yes they definitely keep it in their database as they definitely have more than 1 server that needs to access the data.

As for scalability, I'm sure they use a lot of caching.

Here is an example:

If you have 1 million rows to go through, an index can perform O(logn) = 20 operations (in the worst case) only to find what you need.

For 2 million, you only need 21 operations (in the worst case) to find what you need.

Every time you double the amount of users to go through you simply need only 1 more operation (in the worst case) with a O(logn) index.

They also have a distributed architecture or a clustered database.

Brian R. Bondy
A: 

Each entry that somebody can like probably contains a list of everybody who does like it (all of this is of course in a database). When you view that entry, they match it against your friends list to see which of them is your friend. Voila.

patros
matching the list of people who liked the item against the entire friends list on the fly. I think this is a very expensive operation. Especially, if you’re implementing a timeline-like.
OneDeveloper
It's O(n) where n is the min(# of people who like it, # of people on your friends list). They also likely cache the result after calculating it once.
patros
+2  A: 

In designing social networking software (mothsorchid.com) I found the only way to address this is to pre-cache streams of notifications. One doesn't query the database at the time of page load to count how many friends and others 'liked this', when someone 'likes' something that is recorded on the object, and when retrieving the object one can compare with the current user's friend list. If someone updates their profile/makes a comment/etc it sends notification objects to friends which are pre-cached in their feeds. Cuts down tremendously on database work at expense of disk space, but disk space is cheap.

As to how Facebook does this, they use Cassandra DBMS, which is probably a little different to what you have in mind.

+3  A: 

Facebook uses Cassandra, a NoSQL database for at least some things. Here's a more detailed discussion of what some of the bigger social media sites do to solve these problems:

http://www.25hoursaday.com/weblog/2009/09/10/BuildingScalableDatabasesDenormalizationTheNoSQLMovementAndDigg.aspx

Lots of interesting reading in there if you follow the links from it to the Digg blog post, etc.

Carl Russmann
Not correct, They are using it only for the Inbox searching feature cf: http://www.facebook.com/note.php?note_id=24413138919
Chmouel Boudjnah
+1  A: 

Keep in mind that Facebook strongly utilizes memcached, so they're retaining a lot of data in memory and only refreshing it when absolutely necessary. See this blog post for some scalability discussion around this:

http://www.facebook.com/note.php?note_id=39391378919

The Matt
A: 

A lot of this are explained by the Director of Engineering of Facebook in this QCon presentation :

http://www.infoq.com/presentations/Facebook-Software-Stack

A great presentation to watch.....

Chmouel Boudjnah