ansaurus

Question

Database Relationship or friend table design for a social networking site

Answer 1

+4 A:

Both of the alternatives you've suggested would no doubt result in grief - imagine 400 million tables, or managing 400 million files.

Definitely best to maintain a properly indexed relationships table.

Will A 2010-08-30 17:30:21

Answer 2

+4 A:

While you think of eventually supporting millions of users, you're only ever seeing a particular persons friends list - that limits the actual amount of data substantially...

In order to maintain normalized friendship relationships in the database, you'd need two tables:

USERS

user_id (primary key)
username

FRIENDS

user_id (primary key, foreign key to USERS(user_id))
friend_id (primary key, foreign key to USERS(user_id))

This will stop duplicates (IE: 1, 2) from happening, but won't stop reversals because (2, 1) is valid. You'd need a trigger to enforce that there's only one instance of the relationship...

OMG Ponies 2010-08-30 17:37:22

+1 The point about the double-check trigger is often overlooked

Xeoncross 2010-08-30 17:43:25

You'll probabaly need to shard the tables too (if you're preparing for 500 million entries) you'd have an algorithm to figure out what shard to select the data from. For example, having 500 million rows of data and splitting it into 5 million row partitions will dramatically increase the speed of selects/inserts.

Kieran Allen 2010-08-30 17:50:43

How would I use a trigger for controlling this, Ive never used them before?

Dom 2010-08-30 18:23:13

@Dom: That deserves its own question... You'd want a BEFORE INSERT trigger, because you want to validate the data before it's inserted. If the relationship already exists, it should return an error so your application can inform the user that the relationship already exists.

OMG Ponies 2010-08-30 18:35:03

Another approach to the problem of avoiding duplicates would simply be to always sort your inputs application-side, so that user_id < friend_id -- e.g., (2, 1) would always be inserted as (1, 2). A `CHECK` constraint could help you enforce that rule, although apparently that's only an option on non-MySQL databases.

Frank Farmer 2010-09-08 16:41:20

Answer 3

+11 A:

Build the schema you need today, not the one you think you'll need 5 years from now.

Do you think facebook designed their schema to support 400 million users on day one? Of course not. Building for that kind of scale is complicated, expensive, and honestly, if you try it now, you'll probably get it wrong and have to redo it later anyway.

And let's be honest: you have a better chance of winning the lottery than hitting 400 million users any time soon. Even if you do, your project will have hundreds of engineers by then -- plenty of bandwidth for redesigning your schema.

Now's the time to build simple.

Edit to add some solid examples:

Youtube:

They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.

Keep it simple! Simplicity allows you to rearchitect more quickly so you can respond to problems. It's true that nobody really knows what simplicity is, but if you aren't afraid to make changes then that's a good sign simplicity is happening.

Livejournal also grew from a single database on a single server to multiple sharded replicated databases

I'm sure you could find a dozen more examples on the highscalability blog

Frank Farmer 2010-08-30 17:39:18

+1!!!! Even if you wanted to prepare the best you could you couldn't Facebook has two entire dedicated servers and over 200 memcached instances...

Kieran Allen 2010-08-30 17:45:17

@Frank well said. more developers should understand this.

Alec Smart 2010-08-30 17:48:32

I think you can go a step further and say that you WILL need to redo the schema later on. Data WILL need to be sharded and you'll need to introduce database slaves etc (and develop the schema around those points) It's not simple and will require qualified DBA's to come up with a solution which works for the specific site and load.

Kieran Allen 2010-08-30 17:52:49

@Kieran Allen: Table partitioning implementations vary, but IIRC most don't require major data model updates. I don't care for the idea of planning for a data model overhaul - that should be a *very* last ditch alternative.

OMG Ponies 2010-08-30 17:59:39

Hey guys thanks alot for the responses, they were very helpful. I think I will do what most of you are saying which is to build for the amount of users that I will have now and redesign later when I reach that level. For each relationship someone mentioned duplicates. How do I go about controlling this, I never used triggers before ?

Dom 2010-08-30 18:14:20

@OMG Ponies Good point. However, when partitioning you'll need to take into account the shortcomings (counting all friends, self-joins and larger queries - this all requires quite some planning) The point i'm trying to make is IF you get that big you'll have dedicated DBA's anyway.

Kieran Allen 2010-08-30 18:20:25

@OMG Ponies -- added a couple of examples of big names who started simple and rewrote as they grew.

Frank Farmer 2010-09-08 16:35:23

Answer 4

+2 A:

If you expect the levels of success attained by Facebook (I like your confidence), you will soon realize what they realized. Relational databases begin to fall short and you'll want to look into NoSQL solutions.

That being said, why pre-optimize for 400 millions users? Build a system that will work now for, say, 500, 000 users. If you need to redesign after that, then you must be very successful and will have the resources to do so.

webbiedave 2010-08-30 17:39:20

Answer 5

A:

You could accomplish this using a table to represent the "Relationship" that one user has with another user. This is essentially a JOIN table between two different rows in the same table. An example join table might include the following columns:

USER_1_ID
USER_2_ID

To get a list of friends write a query that performs an INNER JOIN from the USER in question to the RELATIONSHIP table back to a second instance on the USER table.

CitizenForAnAwesomeTomorrow 2010-08-30 17:40:31

Answer 6

+1 A:

In your code when inserting relationships into table follow a convention.

issueSQLQuery("INSERT INTO relationships (friend1, friend2) VALUES (?, ?)", min(friend_1_ID, friend_2_ID), max(friend_1_ID, friend_2_ID))

And similarly for retrievals. Of course this could be done in a stored procedure.

Novikov 2010-08-30 17:41:38

+1 for enforcing order on the relationships table. That's probably the simplest approach to the issue of duplication/reversal.

Frank Farmer 2010-09-08 16:42:44

Answer 7

A:

something like this should do you initially: http://pastie.org/1127206

drop table if exists user_friends;
drop table if exists users;

create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) unique not null,
created_date datetime not null
)
engine=innodb;

delimiter #

create trigger users_before_ins_trig before insert on users
for each row
begin
 set new.created_date = now();
end#

delimiter ;

create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
created_date datetime not null,
primary key (user_id, friend_user_id), -- note clustered composite PK
foreign key (user_id) references users(user_id),
foreign key (friend_user_id) references users(user_id)
)
engine=innodb;

delimiter #

create trigger user_friends_before_ins_trig before insert on user_friends
for each row
begin
 set new.created_date = now();
end#

delimiter ;


drop procedure if exists insert_user;

delimiter #

create procedure insert_user
(
in p_username varchar(32)
)
proc_main:begin

  insert into users (username) values (p_username);

end proc_main #

delimiter ;

drop procedure if exists insert_user_friend;

delimiter #

create procedure insert_user_friend
(
in p_user_id int unsigned,
in p_friend_user_id int unsigned
)
proc_main:begin

  if p_user_id = p_friend_user_id then
    leave proc_main;
  end if;

  insert into user_friends (user_id, friend_user_id) values (p_user_id, p_friend_user_id);

end proc_main #

delimiter ;

drop procedure if exists list_user_friends;

delimiter #

create procedure list_user_friends
(
in p_user_id int unsigned
)
proc_main:begin

  select
    u.*
  from
    user_friends uf
  inner join users u on uf.friend_user_id = u.user_id
  where
    uf.user_id = p_user_id
  order by
   u.username;

end proc_main #

delimiter ;

call insert_user('f00');
call insert_user('bar');
call insert_user('bish');
call insert_user('bash');
call insert_user('bosh');

select * from users;

call insert_user_friend(1,2);
call insert_user_friend(1,3);
call insert_user_friend(1,4);
call insert_user_friend(1,1); -- oops

call insert_user_friend(2,1);
call insert_user_friend(2,5);

select * from user_friends;

call list_user_friends(1);
call list_user_friends(2);

-- call these stored procs from your php !!

f00 2010-08-30 18:55:18

ansaurus

tags:

views:

answers:

Database Relationship or friend table design for a social networking site

USERS

FRIENDS

related questions