views:

656

answers:

1

I have a social network similar to myspace but I use PHP and mysql, I have been looking for the best way to show users bulletins posted only fronm themself and from users they are confirmed friends with.

This involves 3 tables

friend_friend = this table stores records for who is who's friend friend_bulletins = this stores the bulletins friend_reg_user = this is the main user table with all user data like name and photo url

I will post bulletin and friend table scheme below, I will only post the fields important for the user table.

-- Table structure for table friend_bulletin

CREATE TABLE IF NOT EXISTS `friend_bulletin` (
  `auto_id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(10) NOT NULL DEFAULT '0',
  `bulletin` text NOT NULL,
  `subject` varchar(255) NOT NULL DEFAULT '',
  `color` varchar(6) NOT NULL DEFAULT '000000',
  `submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `status` enum('Active','In Active') NOT NULL DEFAULT 'Active',
  `spam` enum('0','1') NOT NULL DEFAULT '1',
  PRIMARY KEY (`auto_id`),
  KEY `user_id` (`user_id`),
  KEY `submit_date` (`submit_date`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=245144 ;

-- Table structure for table friend_friend

CREATE TABLE IF NOT EXISTS `friend_friend` (
  `autoid` int(11) NOT NULL AUTO_INCREMENT,
  `userid` int(10) DEFAULT NULL,
  `friendid` int(10) DEFAULT NULL,
  `status` enum('1','0','3') NOT NULL DEFAULT '0',
  `submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `alert_message` enum('yes','no') NOT NULL DEFAULT 'yes',
  PRIMARY KEY (`autoid`),
  KEY `userid` (`userid`),
  KEY `friendid` (`friendid`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=2657259 ;

friend_reg_user table fields that will be used auto_id = this is the users ID number disp_name = this is the users name pic_url = this is a thumbnail image path

  • Bulletins should show all bulletins posted by a user ID that is in our friend list
  • should also show all bulletins that we posted ourself
  • needs to scale well, friends table is several million rows

// 1 Old method uses a subselect

SELECT auto_id, user_id, bulletin, subject, color, fb.submit_date, spam
FROM friend_bulletin AS fb
WHERE (user_id IN (SELECT userid FROM friend_friend WHERE friendid = $MY_ID AND status =1) OR user_id = $MY_ID)
ORDER BY auto_id

// Another old method that I used on accounts that had a small amount of friends because this one uses another query that would return a string of all there friends in this format $str_friend_ids = "1,2,3,4,5,6,7,8"

select auto_id,subject,submit_date,user_id,color,spam
from friend_bulletin
where user_id=$MY_ID or user_id in ($str_friend_ids) 
order by auto_id DESC

I know these are not good for performance as my site is getting really large so I have been experimenting with JOINS

I beleive this gets everything I need except it needs to be modified to also get bulletins posted by myself, when I add that into the WHERE part it seems to break it and return multiple results for each bulletin posted, I think because it is trying to return results that I am a friedn of and then I try to consider myself a friend and that doesnt work well.

My main point of this whole post though is I am open to opinions on the best performance way to do this task, many big social networks have similar function that return a list of items posted only by your friends. There has to be other faster ways???? I keep reading that JOINS are not great for performance but how else can I do this? Keep in mind I do use indexes and have a dedicated database server but my userbase is large there is no way around that

SELECT fb.auto_id, fb.user_id, fb.bulletin, fb.subject, fb.color, fb.submit_date, fru.disp_name, fru.pic_url
FROM friend_bulletin AS fb
LEFT JOIN friend_friend AS ff ON fb.user_id = ff.userid
LEFT JOIN friend_reg_user AS fru ON fb.user_id = fru.auto_id
WHERE (
ff.friendid =1
AND ff.status =1
)
LIMIT 0 , 30
A: 

First of all, you can try to partition out the database so that you're only accessing a table with the primary rows you need. Move rows that are less often used to another table.

JOINs can impact performance but from what I've seen, subqueries are not any better. Try refactoring your query so that you're not pulling all that data at once. It also seems like some of that query can be run once elsewhere in your app, and those results either stored in variables or cached.

For example, you can cache an array of friends who are connected for each user and just reference that when running the query, and only update the cache when a new friend is added/removed.

It also depends on the structure of your systems and your code architecture - your bottle nneck may not entirely be in the db.

BotskoNet
"you can cache an array of friends who are connected for each user and just reference that when running the query, and only update the cache when a new friend is added/removed."I have actually been looking for a practical way of doing this I am open to any suggestions, 1 issue as you mentioned I could update this cache when I add a friend but when another user adds me it would be more difficult to do but that could be worked around i'm sure. As for caching an array of friends I am not sure how I could do this, some users have as many as 50,000+ friends
jasondavis