views:

11841

answers:

12

I have always wondered how Facebook designed the friend <-> user relation.

I figure the user table is something like this:

 user_email PK
 user_id PK
 password

I figure the table with user's data (sex, age etc connected via user email I would assume).

How does it connect all the friends to this user?

Something like this?

 user_id
 friend_id_1
 friend_id_2
 friend_id_3
 friend_id_N

Probably not. Because the number of users is unknown and will expand.

+14  A: 

It's most likely a many to many relationship:

FriendList (table)

user_id -> users.user_id
friend_id -> users.user_id
friendVisibilityLevel

EDIT

The user table probably doesn't have user_email as a PK, possibly as a unique key though.

users (table)

user_id PK
user_email
password
Nathan Koop
While this certainly makes the most sense, I would think the performance would be horrendous given how many users Facebook has and how many friends each Facebook user has.
Kevin Pang
+9  A: 

Keep a friend table that holds the UserID and then the UserID of the friend (we will call it FriendID). Both columns would be foreign keys back to the Users table.

Somewhat useful example:

Table Name: User
Columns:
    UserID PK
    EmailAddress
    Password
    Gender
    DOB
    Location

TableName: Friends
Columns:
    UserID PK FK
    FriendID PK FK
    (This table features a composite primary key made up of the two foreign 
     keys, both pointing back to the user table. One ID will point to the
     logged in user, the other ID will point to the individual friend
     of that user)

Example Usage:

Table User
--------------
UserID EmailAddress Password Gender DOB      Location
------------------------------------------------------
1      [email protected]  bobbie   M      1/1/2009 New York City
2      [email protected]  jonathan M      2/2/2008 Los Angeles
3      [email protected]  joseph   M      1/2/2007 Pittsburgh

Table Friends
---------------
UserID FriendID
----------------
1      2
1      3
2      3

This will show that Bob is friends with both Jon and Joe and that Jon is also friends with Joe. In this example we will assume that friendship is always two ways, so you would not need a row in the table such as (2,1) or (3,2) because they are already represented in the other direction. For examples where friendship or other relations aren't explicitly two way, you would need to also have those rows to indicate the two-way relationship.

TheTXI
I second this answer!
Poni
+5  A: 

My best bet is that they created a graph structure. The nodes are users and "friendships" are edges.

Keep one table of users, keep another table of edges. Then you can keep data about the edges, like "day they became friends" and "approved status," etc.

belgariontheking
I have a feeling you're going to have to explain that a bit more for some people here.
TheTXI
I think a more interesting question would be how to persist such a huge structure (we're talking about 200 million nodes and billions of edges) in a way that it can easily be searched and updated.
0xA3
@divo: clever use of indexes and partitions.
belgariontheking
+1  A: 

You're looking for foreign keys. Basically you can't have an array in a database unless it has it's own table.


Example schema:

    Users Table
        userID PK
        other data
    Friends Table
        userID   -- FK to users's table representing the user that has a friend.
        friendID -- FK to Users' table representing the user id of the friend
Malfist
Why the downvotes? At least let someone know why you downvoted them.
musicfreak
@freak: Why? The entire concept of voting on this site is for voting to be anonymous. Why do you feel malfist is entitled to anything?
Geoffrey Chetwood
downvotes should leave a comment as to why.
Neil N
Especially when it's a valid answer and is echoed by the other answers (although I didn't copy from them, when I answered, there where no answers)
Malfist
Neil N: There is nothing aside from your personal opinion that that should be the case. There has been numerous attempts at suggesting that commenting should be a requirement and each time it has been met with a resounding NO because voting in general is to be anonymous. It may be polite to leave a comment explaining your vote, but there is nothing stating you must.
TheTXI
@TheTXI: I think comments on downvotes are a courtesy, especially on answers that don't obviously deserve them, but I also agree that comments should not be mandated.
Robert S.
how do you tell who downvoted? Is this something only someone with enough rep can see?
Chris Kaminski
I never said it should be required... but a courtesy would be nice.
Neil N
@darthcoder, no, downvotes are anonymous
Nathan Koop
+3  A: 

Searching a little on Google I found this reverse-engineered object model / DB schema.

However, this is probably far from what you would find in reality on Facebook's servers. They claim to have >200 million users and we can easily guess that the number of FriendRelations will be in the range of billions. So this will need a highly optimized schema and database engine probably only the people working at Facebook will know.

0xA3
+1  A: 

Keep in mind that database tables are designed to grow vertically (more rows), not horizontally (more columns)

Neil N
NEVER FORGET! My dad died because a db table that had grown too far vertically for its columns. I'll miss you Dad.
belgariontheking
hmm, why the downvote? And the comment above this one doesnt make sense.
Neil N
No, the comment doesn't make sense. Seems like someone tried to be funny, so don't mind.
0xA3
http://en.wikipedia.org/wiki/NoSQL
instantsetsuna
+3  A: 

Take a look at these articles describing how LinkedIn and Digg are built:

There's also "Big Data: Viewpoints from the Facebook Data Team" that might be helpful:

http://developer.yahoo.net/blogs/theater/archives/2008/01/nextyahoonet_big_data_viewpoints_from_the_fac.html

Also, there's this article that talks about non-relational databases and how they're used by some companies:

http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

You'll see that these companies are dealing with data warehouses, partitioned databases, data caching and other higher level concepts than most of us never deal with on a daily basis. Or at least, maybe we don't know that we do.

There are a lot of links on the first two articles that should give you some more insight.

HTH

iKnowKungFoo
A: 

My guess would be something along the lines of a large key-value store for speed. This probably isn't what you'd be doing for a smaller site, as it makes things a lot more complex. For example, there would be something along the lines of:

get_data(userid = 12345) returns: { name: "John Doe", email: ... }
get_friends(userid = 12345) returns: { 22222, 33333, 44444, ... }

When updating friends information, the data on both sides would need to be updated.

FryGuy
+1  A: 

Regarding the performance of a many-to-many table, if you have 2 32-bit ints linking user IDs, your basic data storage for 200,000,000 users averaging 200 friends apiece is just under 300GB.

Obviously, you would need some partitioning and indexing and you're not going to keep that in memory for all users.

Cade Roux
A: 

look this link http://www.flickr.com/photos/ikhnaton2/533233247/

A: 

Well thinking about Oracle, I've heared that is used for military purpose too... Maybe it has something to deal with a lot of datas too. I think is a big advantage, for facebook, to use a relational database instead of a non relatinal.... but obviusly this is my way of thinking

Fire-Dragon-DoL
+3  A: 

Hey Friends,

Its not possible to retrive datas from rdbms for user friends datas for datas which crossed more than half a billion at a constant time so facebook implemented this using hash database(no sql) and they opensourced the database called cassandra so every user has its own key and the friends details in a queue to know how cassandra works look at this http://prasath.posterous.com/cassandra-55

thanks

Very interesting , thank you my friend. When did they switch to cassandra from sql? do you happen to know?
Marin