Hello Im in the midst of creating a social networking site and I would like to know how I would go about creating the relationships between users. Many sites say that I should create a relationship/friend table, but Im looking into the future and believe that this would be ineffective. This idea could be as popular as facebook and I want to be ready for that many users. Facebook has 400 million users so a friends table would be at least 150 times that. Doing a query for ones friends would be very slow I would think. So would the solution be a seperate table for each user containing their friends ID's. or an associated CSV file containing the ID's. Any help would be greatly appreciated to the design of my site. Thanks
Both of the alternatives you've suggested would no doubt result in grief - imagine 400 million tables, or managing 400 million files.
Definitely best to maintain a properly indexed relationships table.
While you think of eventually supporting millions of users, you're only ever seeing a particular persons friends list - that limits the actual amount of data substantially...
In order to maintain normalized friendship relationships in the database, you'd need two tables:
USERS
- user_id (primary key)
- username
FRIENDS
- user_id (primary key, foreign key to USERS(user_id))
- friend_id (primary key, foreign key to USERS(user_id))
This will stop duplicates (IE: 1, 2) from happening, but won't stop reversals because (2, 1) is valid. You'd need a trigger to enforce that there's only one instance of the relationship...
Build the schema you need today, not the one you think you'll need 5 years from now.
Do you think facebook designed their schema to support 400 million users on day one? Of course not. Building for that kind of scale is complicated, expensive, and honestly, if you try it now, you'll probably get it wrong and have to redo it later anyway.
And let's be honest: you have a better chance of winning the lottery than hitting 400 million users any time soon. Even if you do, your project will have hundreds of engineers by then -- plenty of bandwidth for redesigning your schema.
Now's the time to build simple.
Edit to add some solid examples:
They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems. It's true that nobody really knows what simplicity is, but if you aren't afraid to make changes then that's a good sign simplicity is happening.
Livejournal also grew from a single database on a single server to multiple sharded replicated databases
I'm sure you could find a dozen more examples on the highscalability blog
If you expect the levels of success attained by Facebook (I like your confidence), you will soon realize what they realized. Relational databases begin to fall short and you'll want to look into NoSQL solutions.
That being said, why pre-optimize for 400 millions users? Build a system that will work now for, say, 500, 000 users. If you need to redesign after that, then you must be very successful and will have the resources to do so.
You could accomplish this using a table to represent the "Relationship" that one user has with another user. This is essentially a JOIN table between two different rows in the same table. An example join table might include the following columns:
- USER_1_ID
- USER_2_ID
To get a list of friends write a query that performs an INNER JOIN from the USER in question to the RELATIONSHIP table back to a second instance on the USER table.
In your code when inserting relationships into table follow a convention.
issueSQLQuery("INSERT INTO relationships (friend1, friend2) VALUES (?, ?)", min(friend_1_ID, friend_2_ID), max(friend_1_ID, friend_2_ID))
And similarly for retrievals. Of course this could be done in a stored procedure.
something like this should do you initially: http://pastie.org/1127206
drop table if exists user_friends;
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) unique not null,
created_date datetime not null
)
engine=innodb;
delimiter #
create trigger users_before_ins_trig before insert on users
for each row
begin
set new.created_date = now();
end#
delimiter ;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
created_date datetime not null,
primary key (user_id, friend_user_id), -- note clustered composite PK
foreign key (user_id) references users(user_id),
foreign key (friend_user_id) references users(user_id)
)
engine=innodb;
delimiter #
create trigger user_friends_before_ins_trig before insert on user_friends
for each row
begin
set new.created_date = now();
end#
delimiter ;
drop procedure if exists insert_user;
delimiter #
create procedure insert_user
(
in p_username varchar(32)
)
proc_main:begin
insert into users (username) values (p_username);
end proc_main #
delimiter ;
drop procedure if exists insert_user_friend;
delimiter #
create procedure insert_user_friend
(
in p_user_id int unsigned,
in p_friend_user_id int unsigned
)
proc_main:begin
if p_user_id = p_friend_user_id then
leave proc_main;
end if;
insert into user_friends (user_id, friend_user_id) values (p_user_id, p_friend_user_id);
end proc_main #
delimiter ;
drop procedure if exists list_user_friends;
delimiter #
create procedure list_user_friends
(
in p_user_id int unsigned
)
proc_main:begin
select
u.*
from
user_friends uf
inner join users u on uf.friend_user_id = u.user_id
where
uf.user_id = p_user_id
order by
u.username;
end proc_main #
delimiter ;
call insert_user('f00');
call insert_user('bar');
call insert_user('bish');
call insert_user('bash');
call insert_user('bosh');
select * from users;
call insert_user_friend(1,2);
call insert_user_friend(1,3);
call insert_user_friend(1,4);
call insert_user_friend(1,1); -- oops
call insert_user_friend(2,1);
call insert_user_friend(2,5);
select * from user_friends;
call list_user_friends(1);
call list_user_friends(2);
-- call these stored procs from your php !!